radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

num_cores_per_sim_cu > 1 has no effect on ARCHER #168

Closed ibethune closed 9 years ago

ibethune commented 9 years ago

If you set num_cores_per_sim_cu > 1 on ARCHER this has no effect on the gromacs jobs. I just tested with this set to 2. The 'run.py' script is launched with aprun -n 2 (i.e. a parallel Python environment with 2 MPI processes). However, even though the generated run.sh script runs "mdrun -nt 2 ...", gromacs only starts with a single process. From the md.log:

Command line:
  mdrun -nt 2 -s topol.tpr -o traj.trr -e ener.edr
...
Using 1 MPI thread
Using 1 OpenMP thread

I think this does not work because mdrun is not launched directly via aprun, so has no knowledge of the parallel environment the python run.py script is running in. As far as I know, the only solution is to launch mdrun directly with aprun.

marksantcroos commented 9 years ago

Two comments:

First, on ARCHER (currently) a job can only be the size of a node or more. This is a limitation of APRUN. (our eCSE proposal addressed this ...) This is not enforced in the current released version, but the RADICAL members should be aware of this.

Secondly, the relative high level of abstraction of a CU doesn't really cover the distinction of cores/threads/etc. We have an open ticket on that in the RP repo: https://github.com/radical-cybertools/radical.pilot/issues/194. Basically we were "waiting" for a real use-case to work on this. Happy to work with you to try to tackle this.

ibethune commented 9 years ago

Thanks Mark, we knew that you can only launch one aprun job per node concurrently (hence why we had to cut the size of the ExTASY tutorial down for ARCHER!), OpenMP might be helpful but only gets you up to the size of a node. Here the problem is more general, that no matter what you set num_cores_per_sim_cu , the actual gromacs executable (mdrun) will only run with one MPI task.

This can be fixed in ExTASY (not RP) by rewriting the Gromacs simulator class (and some changes to the corresponding mdkernels, so that the final call to mdrun is not wrapped inside the run.py script. The setup can be done instead via a pre-exec step in the CU. The way the Amber CUs work is a good example of this.

andre-merzky commented 9 years ago

I created a new branch for enseblemd, at https://github.com/radical-cybertools/radical.ensemblemd/tree/feature/mp_gromacs. Vivek, please have a look at it -- the only thing which differs from devel is the run.py, ie. the CU workload. That now uses a process pool to run the individual scripts (instead of running them sequentially).

You should make sure to have CU sizes which are full nodes on archer -- we may make this automatic in RP later on.

andre-merzky commented 9 years ago

Vivek, any feedback? :)

vivek-bala commented 9 years ago

There were some errors in the script, I was able to run it on archer though. I want to vary the core counts more and test before closing this ticket.

andre-merzky commented 9 years ago

Yeah, sorry for the cores/threadnum mixup, I only saw that today... Thanks for having a look at it :)

vivek-bala commented 9 years ago

This should be resolved in the devel branch. Please test again.

ibethune commented 9 years ago

Elena retried, using the latest version, but it seems that gromacs is still only running on a single thread/core:

ExTASY version : 0.1.4-beta-15-g5bbe261

from gromacslsdmap.wcfg: num_cores_per_sim_cu = 3

from mdlog_00003.log:

Using 1 MPI thread Using 1 OpenMP thread

According to the run script below, mdrun is still started in serial mode:

more unit.000016/run_00003.sh

!/bin/bash

idx=00003

exec > stdout$idx exec 2> stdout$idx

sed "76"','"100"'!d' start.gro > start_$idx.gro

grompp \ \ -f grompp.mdp \ -p topol.top \ -c start$idx.gro \ -o topol$idx.tpr \ -po mdout_$idx.mdp

mdrun -nt 3 \ \ -o traj$idx.trr \ -e ener$idx.edr \ -s topol$idx.tpr \ -g mdlog$idx.log \ -cpo state$idx.cpt \ -c outgro$idx

On ARCHER, to get a parallel execution, it absolutely has to be the case that you run 'aprun -n XXX ... mdrun ....'. Wrapping the mdrun in anything else (here a bash script), won't work.

Unless you have ideas for how this can be resolved, I suggest we release 0.1 later this week with the caveat that gromacs will only use a single core on ARCHER, and experiment with the correct solution in the long run. Thoughts?

vivek-bala commented 9 years ago

ah ok. I was looking at

Command line:
  mdrun -nt 2 -o traj_00002.trr -e ener_00002.edr -s topol_00002.tpr -g mdlog_00002.log -cpo state_00002.cpt -c outgro_00002

But yes, it seems to be the case that gromacs still runs with 1 core. I am not sure how to extend the current implementation with aprun in that case. Andre ?

So currently, multiple simulations are wrapped up within one compute task for gromacs-lsdmap (not for amber-coco). This, I think, is due to the fact that the molecule is small. The straightforward solution is to perform one simulation in each task, which I think would be ideal for large (real case) molecules. In this case, then, the number of simulations is limited by the number of concurrent RP compute units (which I believe is O(1000)).

andre-merzky commented 9 years ago

Ah, true, each gromacs instance runs with one core. But there run n gromacs instances in parallel now, thus utilizing all cores of the node. See https://github.com/radical-cybertools/ExTASY/blob/devel/src/radical/ensemblemd/extasy/bin/Simulator/Gromacs/run.py#L104, where we use a multiprocessing.Pool process pool of size cores.

ibethune commented 9 years ago

OK, we checked and indeed several gromacs single-core instances were running concurrently. I think this can be closed then - thanks!