Closed ibethune closed 9 years ago
Two comments:
First, on ARCHER (currently) a job can only be the size of a node or more. This is a limitation of APRUN. (our eCSE proposal addressed this ...) This is not enforced in the current released version, but the RADICAL members should be aware of this.
Secondly, the relative high level of abstraction of a CU doesn't really cover the distinction of cores/threads/etc. We have an open ticket on that in the RP repo: https://github.com/radical-cybertools/radical.pilot/issues/194. Basically we were "waiting" for a real use-case to work on this. Happy to work with you to try to tackle this.
Thanks Mark, we knew that you can only launch one aprun job per node concurrently (hence why we had to cut the size of the ExTASY tutorial down for ARCHER!), OpenMP might be helpful but only gets you up to the size of a node. Here the problem is more general, that no matter what you set num_cores_per_sim_cu , the actual gromacs executable (mdrun) will only run with one MPI task.
This can be fixed in ExTASY (not RP) by rewriting the Gromacs simulator class (and some changes to the corresponding mdkernels, so that the final call to mdrun is not wrapped inside the run.py script. The setup can be done instead via a pre-exec step in the CU. The way the Amber CUs work is a good example of this.
I created a new branch for enseblemd, at https://github.com/radical-cybertools/radical.ensemblemd/tree/feature/mp_gromacs. Vivek, please have a look at it -- the only thing which differs from devel is the run.py, ie. the CU workload. That now uses a process pool to run the individual scripts (instead of running them sequentially).
You should make sure to have CU sizes which are full nodes on archer -- we may make this automatic in RP later on.
Vivek, any feedback? :)
There were some errors in the script, I was able to run it on archer though. I want to vary the core counts more and test before closing this ticket.
Yeah, sorry for the cores/threadnum mixup, I only saw that today... Thanks for having a look at it :)
This should be resolved in the devel branch. Please test again.
Elena retried, using the latest version, but it seems that gromacs is still only running on a single thread/core:
ExTASY version : 0.1.4-beta-15-g5bbe261
from gromacslsdmap.wcfg: num_cores_per_sim_cu = 3
from mdlog_00003.log:
Using 1 MPI thread Using 1 OpenMP thread
According to the run script below, mdrun is still started in serial mode:
more unit.000016/run_00003.sh
idx=00003
exec > stdout$idx exec 2> stdout$idx
sed "76"','"100"'!d' start.gro > start_$idx.gro
grompp \ \ -f grompp.mdp \ -p topol.top \ -c start$idx.gro \ -o topol$idx.tpr \ -po mdout_$idx.mdp
mdrun -nt 3 \ \ -o traj$idx.trr \ -e ener$idx.edr \ -s topol$idx.tpr \ -g mdlog$idx.log \ -cpo state$idx.cpt \ -c outgro$idx
On ARCHER, to get a parallel execution, it absolutely has to be the case that you run 'aprun -n XXX ... mdrun ....'. Wrapping the mdrun in anything else (here a bash script), won't work.
Unless you have ideas for how this can be resolved, I suggest we release 0.1 later this week with the caveat that gromacs will only use a single core on ARCHER, and experiment with the correct solution in the long run. Thoughts?
ah ok. I was looking at
Command line:
mdrun -nt 2 -o traj_00002.trr -e ener_00002.edr -s topol_00002.tpr -g mdlog_00002.log -cpo state_00002.cpt -c outgro_00002
But yes, it seems to be the case that gromacs still runs with 1 core. I am not sure how to extend the current implementation with aprun in that case. Andre ?
So currently, multiple simulations are wrapped up within one compute task for gromacs-lsdmap (not for amber-coco). This, I think, is due to the fact that the molecule is small. The straightforward solution is to perform one simulation in each task, which I think would be ideal for large (real case) molecules. In this case, then, the number of simulations is limited by the number of concurrent RP compute units (which I believe is O(1000)).
Ah, true, each gromacs instance runs with one core. But there run n
gromacs instances in parallel now, thus utilizing all cores of the node. See https://github.com/radical-cybertools/ExTASY/blob/devel/src/radical/ensemblemd/extasy/bin/Simulator/Gromacs/run.py#L104, where we use a multiprocessing.Pool
process pool of size cores
.
OK, we checked and indeed several gromacs single-core instances were running concurrently. I think this can be closed then - thanks!
If you set num_cores_per_sim_cu > 1 on ARCHER this has no effect on the gromacs jobs. I just tested with this set to 2. The 'run.py' script is launched with aprun -n 2 (i.e. a parallel Python environment with 2 MPI processes). However, even though the generated run.sh script runs "mdrun -nt 2 ...", gromacs only starts with a single process. From the md.log:
I think this does not work because mdrun is not launched directly via aprun, so has no knowledge of the parallel environment the python run.py script is running in. As far as I know, the only solution is to launch mdrun directly with aprun.