libAtoms / workflow

python workflow toolkit
GNU General Public License v2.0
32 stars 18 forks source link

autoparallelized wfl is slower than GNU parallel. #266

Open jungsdao opened 1 year ago

jungsdao commented 1 year ago

Hello, I was trying with wfl package with minima hopping. And I have discovered that when the same process is compared with GNU parallel, wfl is way more slower. Following, I have compared one geometry relaxation step within minima hopping between GNU parallel and wfl.

  1. GNU parallel

                Step[ FC]     Time          Energy          fmax
    *Force-consistent energies used in optimization.
    BFGSLineSearch:    0[  0] 14:25:38     -534.014633*       3.0278
    BFGSLineSearch:    1[  2] 14:25:44     -534.189030*       1.3638
    BFGSLineSearch:    2[  4] 14:25:48     -534.351071*       2.4404
    BFGSLineSearch:    3[  5] 14:25:49     -534.665330*       2.9015
    BFGSLineSearch:    4[  6] 14:25:54     -534.994242*       2.9427
    BFGSLineSearch:    5[  8] 14:25:57     -535.194093*       3.3632
    BFGSLineSearch:    6[ 10] 14:26:00     -535.408273*       2.3933
    BFGSLineSearch:    7[ 12] 14:26:03     -535.550177*       1.3325
    BFGSLineSearch:    8[ 14] 14:26:06     -535.684763*       3.0824
    BFGSLineSearch:    9[ 16] 14:26:09     -536.134389*       1.6315
    BFGSLineSearch:   10[ 17] 14:26:10     -536.266573*       1.9401
    BFGSLineSearch:   11[ 18] 14:26:12     -536.369399*       1.6626
    BFGSLineSearch:   12[ 19] 14:26:13     -536.445034*       0.9739
    BFGSLineSearch:   13[ 21] 14:26:16     -536.561701*       0.8027
    BFGSLineSearch:   14[ 23] 14:26:19     -536.650002*       0.5800
    BFGSLineSearch:   15[ 24] 14:26:21     -536.681904*       0.3205
    BFGSLineSearch:   16[ 25] 14:26:22     -536.701256*       0.3439
    BFGSLineSearch:   17[ 26] 14:26:23     -536.713216*       0.1637
    BFGSLineSearch:   18[ 27] 14:26:25     -536.717918*       0.2025
    BFGSLineSearch:   19[ 29] 14:26:28     -536.719005*       0.1261
    BFGSLineSearch:   20[ 30] 14:26:29     -536.720355*       0.0749
    BFGSLineSearch:   21[ 31] 14:26:30     -536.721657*       0.1069
    BFGSLineSearch:   22[ 34] 14:26:34     -536.724530*       0.1934
    BFGSLineSearch:   23[ 35] 14:26:35     -536.726039*       0.2071
    BFGSLineSearch:   24[ 37] 14:26:38     -536.727076*       0.1802
    BFGSLineSearch:   25[ 38] 14:26:39     -536.728364*       0.1102
    BFGSLineSearch:   26[ 39] 14:26:40     -536.729102*       0.0925
    BFGSLineSearch:   27[ 40] 14:26:42     -536.729533*       0.0620
    BFGSLineSearch:   28[ 41] 14:26:43     -536.729646*       0.0437

    Whole relaxation step finished within 1-2 minutes when parallelized with GNU parallel.

  2. wfl

                Step[ FC]     Time          Energy          fmax
    *Force-consistent energies used in optimization.
    BFGSLineSearch:    0[  0] 14:45:06     -535.723086*       0.9643
    BFGSLineSearch:    1[  1] 14:46:21     -535.822649*       2.3116
    BFGSLineSearch:    2[  3] 14:50:03     -535.964835*       1.0688
    BFGSLineSearch:    3[  5] 14:54:51     -535.992950*       0.5638
    BFGSLineSearch:    4[  7] 14:56:34     -536.008899*       0.5146
    BFGSLineSearch:    5[  9] 14:58:17     -536.019360*       0.5074
    BFGSLineSearch:    6[ 11] 15:00:02     -536.027474*       0.6783
    BFGSLineSearch:    7[ 12] 15:00:55     -536.050990*       0.5662
    BFGSLineSearch:    8[ 14] 15:02:40     -536.066940*       0.9052
    BFGSLineSearch:    9[ 16] 15:04:24     -536.200172*       1.0468

    Whereas with wfl pacakge one relaxation step takes 1-2 minutes.

I'm using MACE potential for relaxation. Could anyone give some comments on potential reason why it's way slower in wfl parallelization? Many thanks in advance.

bernstei commented 1 year ago

How are you parallelizing with "gnu parallel"? In general wfl parallelizes over multiple input configurations, but I have no idea how the minima hopping is parallelized - I'd guess over multiple initial configs.

jungsdao commented 1 year ago

In GNU parallel, it's also parallelized over initial configuration. If that's what you asked. As far as I know, in parallelization of minima hopping, it's also parallelized over multiple input configurations.

bernstei commented 1 year ago

Without more information on exactly what you're calling and what the "gnu parallel" parallelization is doing (I didn't see any mention of it on the ASE minimahopping web docs), there's no way to tell. This looks like the output from a single config local minimization (that's what BFGSLineSearch usualky does). We have no idea what system you're using, how long a single force evaluation is expected to take, etc, so no way to know what's reasonable.

jungsdao commented 1 year ago

GNU parallel is not related to ASE minima hopping, but I just used it to parallelize minima hopping. It's just multiple execution of separate python script but shares one common file of minima.traj which is the history of found minima. python script (parallel_minhop.py) would be look like following.

from ase.io import read
from ase.optimize.minimahopping import MinimaHopping

atoms = read("structure.traj")
atoms.calc = MACECalculator(model_path = final_mlip_file, device="cpu")

opt = MinimaHopping(atoms, Ediff0=0.75, T0=2000, fmax=0.05, 
        minima_traj="../minima.traj", timestep=0.5, use_abort_check=False)
opt(totalsteps = 80, maxtemp=2*2000) 

in a file called cmd.lst, the commands that I want to parallelize is given

cd ./00; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./01; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./02; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./03; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./04; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
...
cd ./56; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 

And GNU parallel is executed with following command. parallel -X --delay 0.2 --joblog task.log --progress --resume -j 72 < cmd.lst Up to so far is how GNU parallel is executed.

What I posted above is logfile of local minimization of single configuration. (qn0000.log in minima hopping) What I would expect is that GNU parallel and wfl should be similar in their geometry relaxation speed since I'm relaxing the same structure with the same MACE potential.

bernstei commented 1 year ago

Which of those times is reasonable energy evaluation time for your system? How exactly are you running the wfl job? Are you somehow forcing all the parallel wfl processes to share one core?

jungsdao commented 1 year ago

Relaxation time of GNU parallel is more reasonable evaluation time for this system. It shouldn't be that slow. Maybe it's better to attach tar.gz of files that I have used to parallelize minima hopping with wfl.

wfl_paralllel_minhop.tar.gz

bernstei commented 1 year ago

I don't see anything obviously wrong. Can you ssh into the node while it's running? If you run top, you should see N (in principle 72, but I only see 57 initial configs) python processes, each using 100% CPU, not more. Is that what you see?

bernstei commented 1 year ago

Other things that might be helpful - run on the node, but don't autoparallelize. Add print statements with timing info (print(time.time())) to the wfl minima hopping wrapper to see if something unexpected is taking a lot of time.