Open jungsdao opened 1 year ago
How are you parallelizing with "gnu parallel"? In general wfl parallelizes over multiple input configurations, but I have no idea how the minima hopping is parallelized - I'd guess over multiple initial configs.
In GNU parallel, it's also parallelized over initial configuration. If that's what you asked. As far as I know, in parallelization of minima hopping, it's also parallelized over multiple input configurations.
Without more information on exactly what you're calling and what the "gnu parallel" parallelization is doing (I didn't see any mention of it on the ASE minimahopping web docs), there's no way to tell. This looks like the output from a single config local minimization (that's what BFGSLineSearch
usualky does). We have no idea what system you're using, how long a single force evaluation is expected to take, etc, so no way to know what's reasonable.
GNU parallel is not related to ASE minima hopping, but I just used it to parallelize minima hopping. It's just multiple execution of separate python script but shares one common file of minima.traj
which is the history of found minima.
python script (parallel_minhop.py
) would be look like following.
from ase.io import read
from ase.optimize.minimahopping import MinimaHopping
atoms = read("structure.traj")
atoms.calc = MACECalculator(model_path = final_mlip_file, device="cpu")
opt = MinimaHopping(atoms, Ediff0=0.75, T0=2000, fmax=0.05,
minima_traj="../minima.traj", timestep=0.5, use_abort_check=False)
opt(totalsteps = 80, maxtemp=2*2000)
in a file called cmd.lst
, the commands that I want to parallelize is given
cd ./00; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log
cd ./01; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log
cd ./02; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log
cd ./03; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log
cd ./04; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log
...
cd ./56; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log
And GNU parallel is executed with following command.
parallel -X --delay 0.2 --joblog task.log --progress --resume -j 72 < cmd.lst
Up to so far is how GNU parallel is executed.
What I posted above is logfile of local minimization of single configuration. (qn0000.log
in minima hopping)
What I would expect is that GNU parallel and wfl should be similar in their geometry relaxation speed since I'm relaxing the same structure with the same MACE potential.
Which of those times is reasonable energy evaluation time for your system? How exactly are you running the wfl
job? Are you somehow forcing all the parallel wfl processes to share one core?
Relaxation time of GNU parallel is more reasonable evaluation time for this system. It shouldn't be that slow. Maybe it's better to attach tar.gz of files that I have used to parallelize minima hopping with wfl.
I don't see anything obviously wrong. Can you ssh into the node while it's running? If you run top
, you should see N (in principle 72, but I only see 57 initial configs) python processes, each using 100% CPU, not more. Is that what you see?
Other things that might be helpful - run on the node, but don't autoparallelize. Add print statements with timing info (print(time.time())
) to the wfl minima hopping wrapper to see if something unexpected is taking a lot of time.
Hello, I was trying with wfl package with minima hopping. And I have discovered that when the same process is compared with GNU parallel, wfl is way more slower. Following, I have compared one geometry relaxation step within minima hopping between GNU parallel and wfl.
GNU parallel
Whole relaxation step finished within 1-2 minutes when parallelized with GNU parallel.
wfl
Whereas with wfl pacakge one relaxation step takes 1-2 minutes.
I'm using MACE potential for relaxation. Could anyone give some comments on potential reason why it's way slower in wfl parallelization? Many thanks in advance.