Closed mturilli closed 4 years ago
@wjlei1990 finish the doc: https://docs.google.com/document/d/1uugJsRRSTHDhMFb3NWaRRPwyoMiNPKn6sr2iO3IqMaQ/edit#heading=h.k670rad7dcz1
The summit support GPUMPS
. The user just need to put one extra line in their batch script:
#BSUB -alloc_flags gpumps
And then the jsrun
to configure the resource allocation correctly. Using our current specfem software as an example, we used 384 mpis, so previously it will use 384 cpu cores and 384 GPUs.(each cpu core and GPU will only handle one MPI)
So the jsrun run command would be:
jsrun -n384 -a1 -c1 -g1 ./bin/xspecfem3D
Say now we want to use 2 mpis on 1 gpu, the jsrun command would be:
jsrun -n192 -a2 -c2 -g1 ./bin/xspecfem3D.
If set gpumpi to 4, then
jsrun -n96 -a4 -c4 -g1 ./bin/xspecfem3D.
More details could be found here in the summit doc: https://www.olcf.ornl.gov/for-users/system-user-guides/summit/summit-user-guide/#running-jobs
Radical team has to discuss how to render this in RP.
Note from RP discussion: we likely express this as GPU threads, and cap by max number of shareable ranks.
Error using pip install radical.ensemblemd
in python3 and virtualenv.
(entk) lei@login2 ~/software/summit/virtualenv $
pip install radical.ensemblemd
Collecting radical.ensemblemd
Using cached radical.ensemblemd-0.4.6.tar.gz (100 kB)
ERROR: Command errored out with exit status 1:
command: /autofs/nccs-svm1_home1/lei/software/summit/virtualenv/entk/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-q2trqlc0/radical.ensemblemd/setup.py'"'"'; __file__='"'"'/tmp/pip-install-q2trqlc0/radical.ensemblemd/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-q2trqlc0/radical.ensemblemd/pip-egg-info
cwd: /tmp/pip-install-q2trqlc0/radical.ensemblemd/
Complete output (6 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-q2trqlc0/radical.ensemblemd/setup.py", line 108
def visit((prefix, strip, found), dirname, names):
^
SyntaxError: invalid syntax
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
This looks like a Python version issue - looks like you are using an old version of EnTK with Python3?
Hi, I am using pip install to directly install entk. I guess I should install from source right?
Not critical, Rutgers to discuss internally how to best support this.
@wjlei1990 could you provide us with a link to the batch job script + task code you are using to use GPU MPS on Summit? This would greatly help us to shape our discussion about how to support it in EnTK/RP.
Example lsf script without GPUMPS:
#!/bin/bash
#BSUB -P GEO111
#BSUB -W 00:30
#BSUB -nnodes 64
#BSUB -J solver
#BSUB -o log.solver.%J
jsrun -n 384 -a 1 -c 1 -g 1 ./bin/xspecfem3D
Example script with GPUMPS
#!/bin/bash
#BSUB -P GEO111
#BSUB -W 00:30
#BSUB -nnodes 16
#BSUB -J solver
#BSUB -o log.solver.%J
#BSUB -alloc_flags gpumps
jsrun -n 96 -a 4 -c 4 -g 1 ./bin/xspecfem3D
The difference is:
#BSUB -alloc_flags gpumps
to enable GPUMPSjsrun -n 96 -a 4 -c 4 -g 1 ./bin/xspecfem3D
will uses allows 4 mpi running on 1 single GPU card.Do you just want the script or you want some running example?
Thank you very much! If sharing some running example requires no relevant effort then yes, that might be useful too.
Thank you very much! If sharing some running example requires no relevant effort then yes, that might be useful too.
Hi Matteo, you may find running example here:
/gpfs/alpine/world-shared/geo111/lei/entk/specfem3d_globe_990cd4
There are 3 lsf scripts using GPUMPS from 1 to 4:
I also did some performance benchmark by running the task:
GPUMPS | Job Time (sec) | Core GPU Simulation Time (sec) |
---|---|---|
1 | 81 | 50 |
2 | 133 | 103 |
4 | 194 | 162 |
The core simulation is measuring just the time marching in the SPECFEM solver. The solver needs extra time to read and setup the mesher. Based on our experiment, it takes about 30-32s and it is mainly running on CPU. So we see that it keeps almost constant among experiments since we are not really saturating the CPU power on summit.
@wjlei1990 to provide some details in this ticket.