autoparallelized wfl is slower than GNU parallel.

jungsdao commented 1 year ago

Hello, I was trying with wfl package with minima hopping. And I have discovered that when the same process is compared with GNU parallel, wfl is way more slower. Following, I have compared one geometry relaxation step within minima hopping between GNU parallel and wfl.

GNU parallel

            Step[ FC]     Time          Energy          fmax
*Force-consistent energies used in optimization.
BFGSLineSearch:    0[  0] 14:25:38     -534.014633*       3.0278
BFGSLineSearch:    1[  2] 14:25:44     -534.189030*       1.3638
BFGSLineSearch:    2[  4] 14:25:48     -534.351071*       2.4404
BFGSLineSearch:    3[  5] 14:25:49     -534.665330*       2.9015
BFGSLineSearch:    4[  6] 14:25:54     -534.994242*       2.9427
BFGSLineSearch:    5[  8] 14:25:57     -535.194093*       3.3632
BFGSLineSearch:    6[ 10] 14:26:00     -535.408273*       2.3933
BFGSLineSearch:    7[ 12] 14:26:03     -535.550177*       1.3325
BFGSLineSearch:    8[ 14] 14:26:06     -535.684763*       3.0824
BFGSLineSearch:    9[ 16] 14:26:09     -536.134389*       1.6315
BFGSLineSearch:   10[ 17] 14:26:10     -536.266573*       1.9401
BFGSLineSearch:   11[ 18] 14:26:12     -536.369399*       1.6626
BFGSLineSearch:   12[ 19] 14:26:13     -536.445034*       0.9739
BFGSLineSearch:   13[ 21] 14:26:16     -536.561701*       0.8027
BFGSLineSearch:   14[ 23] 14:26:19     -536.650002*       0.5800
BFGSLineSearch:   15[ 24] 14:26:21     -536.681904*       0.3205
BFGSLineSearch:   16[ 25] 14:26:22     -536.701256*       0.3439
BFGSLineSearch:   17[ 26] 14:26:23     -536.713216*       0.1637
BFGSLineSearch:   18[ 27] 14:26:25     -536.717918*       0.2025
BFGSLineSearch:   19[ 29] 14:26:28     -536.719005*       0.1261
BFGSLineSearch:   20[ 30] 14:26:29     -536.720355*       0.0749
BFGSLineSearch:   21[ 31] 14:26:30     -536.721657*       0.1069
BFGSLineSearch:   22[ 34] 14:26:34     -536.724530*       0.1934
BFGSLineSearch:   23[ 35] 14:26:35     -536.726039*       0.2071
BFGSLineSearch:   24[ 37] 14:26:38     -536.727076*       0.1802
BFGSLineSearch:   25[ 38] 14:26:39     -536.728364*       0.1102
BFGSLineSearch:   26[ 39] 14:26:40     -536.729102*       0.0925
BFGSLineSearch:   27[ 40] 14:26:42     -536.729533*       0.0620
BFGSLineSearch:   28[ 41] 14:26:43     -536.729646*       0.0437

Whole relaxation step finished within 1-2 minutes when parallelized with GNU parallel.

wfl

            Step[ FC]     Time          Energy          fmax
*Force-consistent energies used in optimization.
BFGSLineSearch:    0[  0] 14:45:06     -535.723086*       0.9643
BFGSLineSearch:    1[  1] 14:46:21     -535.822649*       2.3116
BFGSLineSearch:    2[  3] 14:50:03     -535.964835*       1.0688
BFGSLineSearch:    3[  5] 14:54:51     -535.992950*       0.5638
BFGSLineSearch:    4[  7] 14:56:34     -536.008899*       0.5146
BFGSLineSearch:    5[  9] 14:58:17     -536.019360*       0.5074
BFGSLineSearch:    6[ 11] 15:00:02     -536.027474*       0.6783
BFGSLineSearch:    7[ 12] 15:00:55     -536.050990*       0.5662
BFGSLineSearch:    8[ 14] 15:02:40     -536.066940*       0.9052
BFGSLineSearch:    9[ 16] 15:04:24     -536.200172*       1.0468

Whereas with wfl pacakge one relaxation step takes 1-2 minutes.

I'm using MACE potential for relaxation. Could anyone give some comments on potential reason why it's way slower in wfl parallelization? Many thanks in advance.

bernstei commented 1 year ago

How are you parallelizing with "gnu parallel"? In general wfl parallelizes over multiple input configurations, but I have no idea how the minima hopping is parallelized - I'd guess over multiple initial configs.

jungsdao commented 1 year ago

In GNU parallel, it's also parallelized over initial configuration. If that's what you asked. As far as I know, in parallelization of minima hopping, it's also parallelized over multiple input configurations.

bernstei commented 1 year ago

Without more information on exactly what you're calling and what the "gnu parallel" parallelization is doing (I didn't see any mention of it on the ASE minimahopping web docs), there's no way to tell. This looks like the output from a single config local minimization (that's what BFGSLineSearch usualky does). We have no idea what system you're using, how long a single force evaluation is expected to take, etc, so no way to know what's reasonable.

jungsdao commented 1 year ago

GNU parallel is not related to ASE minima hopping, but I just used it to parallelize minima hopping. It's just multiple execution of separate python script but shares one common file of minima.traj which is the history of found minima. python script (parallel_minhop.py) would be look like following.

from ase.io import read
from ase.optimize.minimahopping import MinimaHopping

atoms = read("structure.traj")
atoms.calc = MACECalculator(model_path = final_mlip_file, device="cpu")

opt = MinimaHopping(atoms, Ediff0=0.75, T0=2000, fmax=0.05, 
        minima_traj="../minima.traj", timestep=0.5, use_abort_check=False)
opt(totalsteps = 80, maxtemp=2*2000)

in a file called cmd.lst, the commands that I want to parallelize is given

cd ./00; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./01; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./02; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./03; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./04; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
...
cd ./56; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log

And GNU parallel is executed with following command. parallel -X --delay 0.2 --joblog task.log --progress --resume -j 72 < cmd.lst Up to so far is how GNU parallel is executed.

What I posted above is logfile of local minimization of single configuration. (qn0000.log in minima hopping) What I would expect is that GNU parallel and wfl should be similar in their geometry relaxation speed since I'm relaxing the same structure with the same MACE potential.

bernstei commented 1 year ago

Which of those times is reasonable energy evaluation time for your system? How exactly are you running the wfl job? Are you somehow forcing all the parallel wfl processes to share one core?

jungsdao commented 1 year ago

Relaxation time of GNU parallel is more reasonable evaluation time for this system. It shouldn't be that slow. Maybe it's better to attach tar.gz of files that I have used to parallelize minima hopping with wfl.

wfl_paralllel_minhop.tar.gz

bernstei commented 1 year ago

I don't see anything obviously wrong. Can you ssh into the node while it's running? If you run top, you should see N (in principle 72, but I only see 57 initial configs) python processes, each using 100% CPU, not more. Is that what you see?

bernstei commented 1 year ago

Other things that might be helpful - run on the node, but don't autoparallelize. Add print statements with timing info (print(time.time())) to the wfl minima hopping wrapper to see if something unexpected is taking a lot of time.

libAtoms / workflow

autoparallelized wfl is slower than GNU parallel. #266