NEB implementation - Githubissues

jungsdao commented 4 months ago

Unlike other calculations which starts with single configuration, NEB takes list of configs as images. But it seems to give error related to autoparallelization showing following error. I think it should stop process when NEB is done, but it starts another process because it doesn't know that images of configurations are one unit.

from wfl.configset import ConfigSet, OutputSpec
from wfl.generate.neb import NEB as wflNEB

in_config = ConfigSet(images)
out_config = OutputSpec(files=outfile)

wflNEB(in_config, out_config, calculator=calc_tuple, verbose=True, fmax=2.0, steps=2)

FIRE:    0 20:19:09     -182.638078        3.358245
FIRE:    1 20:19:23     -182.713386        3.240111
FIRE:    2 20:19:38     -182.792399        2.334024
Using local medium Materials Project MACE model for MACECalculator /home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/mace/calculators/foundations_models/2023-12-03-mace-mp.model
Using float64 for MACECalculator, which is slower but more accurate. Recommended for geometry optimization.
Using TorchDFTD3Calculator for D3 dispersion corrections (see https://github.com/pfnet-research/torch-dftd)
Structure optimization failed with exception 'index -1 is out of bounds for axis 0 with size 0'
Traceback (most recent call last):
  File "/work/home/hjung/Calculation/0_surface_auto_TS/pathway_1/test_wflneb.py", line 13, in <module>
    wflNEB(in_config, out_config, calculator=calc_tuple, verbose=True, fmax=2.0, steps=2)
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/generate/neb.py", line 179, in NEB
    return autoparallelize(_run_autopara_wrappable, *args,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/base.py", line 174, in autoparallelize
    return _autoparallelize_ll(autopara_info, inputs, outputs, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/base.py", line 231, in _autoparallelize_ll
    out = do_in_pool(autopara_info.num_python_subprocesses, autopara_info.num_inputs_per_python_subprocess,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/pool.py", line 176, in do_in_pool
    result_group = _wrapped_autopara_wrappable(op, iterable_arg, inherited_per_item_info, args,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/pool.py", line 70, in _wrapped_autopara_wrappable
    outputs = op(*u_args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/generate/neb.py", line 144, in _run_autopara_wrappable
    for at in traj[0]:
              ~~~~^^^
IndexError: list index out of range

bernstei commented 4 months ago

[edited - previous claim about what wfl can parallelize over was wrong]

If you make your input iterator a list of lists of atoms objects, the autoparallelization framework should pass each sub-list. However, you can't make that a ConfigSet right now, although I think I could add that pretty easily.

Try your code but instead of making in_config be a ConfigSet, make it something like

in_config = [images_1, images_2, images_3 ... ]

where each images_N is a list of atomic configurations with the images for a single NEB. The autopara framework should then parallelize over NEBs. That should be possible to get to work. The routine that does a single NEB should be able to return a list of relaxed images for each set of input images, and wfl will keep track of the nested structure (although you'll need the ConfigSet.group_iterator() method to access each image list separately.

jungsdao commented 4 months ago

I have changed as following and it seems to work, but with ConfigSet, it doesn't work. Also I had to change input argument from images to list_of_images. For now I'm not sure whether parallelization over different NEB calculations would be possible.

in_config = [images]
out_config = OutputSpec(files=outfile)

wflNEB(in_config, out_config, calculator=calc_tuple, traj_subselect="last", verbose=True, fmax=2.0, steps=2)

bernstei commented 4 months ago

I have changed as following and it seems to work, but with ConfigSet, it doesn't work.

It's not expected to work, although I can modify it so it can. ConfigSet can be an iterator over groups of configs, but I have to think about the best way to do it.

Also I had to change input argument from images to list_of_images. For now I'm not sure whether parallelization over different NEB calculations would be possible.

Since the code you posted doesn't show what's in images, I don't have any idea what you mean here. The syntax you have above, assuming that images is a list(Atoms) where each Atoms is a single image, is correct. The autoparallelization should also work with

in_config = [images_1, images_2, ...]

etc, if each images_i contains a (different) list of Atoms that are the images of a (corresponding) NEB.

jungsdao commented 4 months ago

Yes images here is list of atoms (interpolated images connecting initial and final structures). I'm glad to see this working. Currently script looks dirty, but I'll try to polish the script later to make it look better. Please let me know if you have some suggestions for this PR.

bernstei commented 4 months ago

You should be able to do something like

in_configs = ConfigSet([images_1, images_2, images_3, ...])
.
.
.
wflNEB(in_config.groups(), out_config, ... )

I'm not 100% sure it'll work, but I think it should, and I'll try to fix it if it doesn't.

I also think that in_configs = ConfigSet(["file_1.xyz", "file_2.xyz", ...]) should work as well, as long as you use in_configs.groups() when you pass it to the autoparallelizaed NEB function.

Things that are needed:

The PR absolutely has to include a pytest.
all the traj subselect stuff probably has to be rethought, since it assumes a trajectory of single configurations.

jungsdao commented 4 months ago

I have tried following, but it didn't work. Any suggestions which I can try? or is it that ConfigSet has to be modified?

  2 from wfl.configset import ConfigSet, OutputSpec
  3 from wfl.generate.neb import NEB as wflNEB
  4 from wfl.autoparallelize.autoparainfo import AutoparaInfo
  5 from ase.io import read, write

  9 images1 = read("**/idpp.traj", ':')
 10 images2 = read("***/idpp.traj", ":")
 11 
 12 outfiles = ["***/reaction_wfl1.xyz",
 13             "***/reaction_wfl2.xyz"]
 14 
 16 in_config = ConfigSet([images1, images2])
 17 out_config = OutputSpec(files=outfiles)
 18 
 19 neb_kwargs = {
 20     "traj_subselect" : "last",
 21     "verbose" : True,
 22     "allow_shared_calculator" : True,
 23     "scale_fmax" : 1,
 24     "fmax" : 2.0,
 25     "steps" : 2
 26 }
 27 
 28 wflNEB(in_config.groups(), out_config, calculator=calc_tuple,
 29        autopara_info= AutoparaInfo(num_python_subprocesses=2, num_inputs_per_python_subprocess=1),
 30         **neb_kwargs)

      Step     Time          Energy          fmax
FIRE:    0 15:21:00     -179.327411        2.827394
      Step     Time          Energy          fmax
FIRE:    0 15:21:00     -182.638078        3.358245
FIRE:    1 15:21:44     -179.400852        2.076527
FIRE:    1 15:21:46     -182.713386        3.240160
FIRE:    2 15:22:28     -179.409632        2.148710
FIRE:    2 15:22:32     -182.794274        2.331917
Traceback (most recent call last):
  File "/work/home/hjung/Calculation/0_surface_auto_TS/pathway_1/test_wflneb.py", line 28, in <module>
    wflNEB(in_config.groups(), out_config, calculator=calc_tuple, 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/generate/neb.py", line 142, in NEB
    return autoparallelize(_run_autopara_wrappable, *args,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/base.py", line 174, in autoparallelize
    return _autoparallelize_ll(autopara_info, inputs, outputs, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/base.py", line 231, in _autoparallelize_ll
    out = do_in_pool(autopara_info.num_python_subprocesses, autopara_info.num_inputs_per_python_subprocess,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/pool.py", line 163, in do_in_pool
    outputspec.store(at, from_input_loc)
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/configset.py", line 521, in store
    if len(input_CS_loc) == 0:
       ^^^^^^^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()

bernstei commented 4 months ago

Thanks - let me take a look.

bernstei commented 4 months ago

@jungsdao are you sure you're using the latest version of the correct branch? line 521 in wfl/configset.py is not the one your error message shows https://github.com/libAtoms/workflow/blob/43d6362544e4d03018ce4c8512a0bd5a1724755a/wfl/configset.py#L521

jungsdao commented 4 months ago

Yeah I realized that this branch was not synchronized but it gives the same error like before after using the latest wfl.

Traceback (most recent call last):
  File "/work/home/hjung/Calculation/0_surface_auto_TS/pathway_1/test_wflneb.py", line 28, in <module>
    wflNEB(in_config.groups(), out_config, calculator=calc_tuple, 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/generate/neb.py", line 142, in NEB
    return autoparallelize(_run_autopara_wrappable, *args,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/base.py", line 174, in autoparallelize
    return _autoparallelize_ll(autopara_info, inputs, outputs, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/base.py", line 231, in _autoparallelize_ll
    out = do_in_pool(autopara_info.num_python_subprocesses, autopara_info.num_inputs_per_python_subprocess,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/autoparallelize/pool.py", line 163, in do_in_pool
    outputspec.store(at, from_input_loc)
  File "/home/hjung/miniforge3/envs/mace_env/lib/python3.11/site-packages/wfl/configset.py", line 533, in store
    if len(input_CS_loc) == 0:
       ^^^^^^^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()

bernstei commented 4 months ago

OK, I just didn't want to start debugging an inconsistent version.

bernstei commented 4 months ago

I'll start looking at this today - it's going to require a bit of low level debugging.

bernstei commented 4 months ago

I just ran a very simple example of an autoparallelized function that expects a list of configurations for each operation, passed it inputs.groups(), and it worked, so it's definitely not entirely broken.

from ase.atoms import Atoms
from wfl.configset import ConfigSet, OutputSpec
from wfl.autoparallelize import autoparallelize

def test_groups(input_atoms):
    for ats in input_atoms:
        print(type(ats))
        for at in ats:
            print("process", at.info, at.numbers)
    return list(input_atoms)

def autopara_test_groups(*args, **kwargs):
    return autoparallelize(test_groups, *args, **kwargs)

ats = [Atoms(numbers=[i]) for i in range(4)]
ats = ConfigSet([[ats[0], ats[1]], [ats[2], ats[3]]])

ats_out = autopara_test_groups(ats.groups(), OutputSpec())
print("")
for at in ats_out:
    print("output", at.info, at.numbers)

The best way to figure this out is for you to create a pytest for this new NEB function, and put in there code that causes the error you're seeing above. Make sure that the espresso_remote_job_test branch is merged into your PR branch, and then I can clone your PR branch and test it on my machine.

bernstei commented 4 months ago

By the way, what exactly are you trying to achieve with the file globs you're using for your input and output? I don't even know what *** will do, but I don't think it'll be different from **, so your two inputs will be the same (maybe that's what you intend, just for this simple of example?). And the OutputSpec doesn't support globs, only full paths, and I think it won't create directories, so I expect that it's going to be very confused when it tries to write the files.

In fact, right now I'm not even sure that we're properly supporting globs in the inputs. Let me see what's going on with that.

bernstei commented 4 months ago

In fact, right now I'm not even sure that we're properly supporting globs in the inputs. Let me see what's going on with that.

Looks to me like, despite what it says in the ConfigSet docs, globs are broken right now anyway, even for inputs. I'm working on fixing that.

jungsdao commented 4 months ago

** in images1 = read("**/idpp.traj", ':') was just to hide directory information which is only specific to my system. I didn't mean to glob or something. It was nothing but placeholder. It should be read like this.

  9 images1 = read("**/idpp1.traj", ':')
 10 images2 = read("***/idpp2.traj", ":")

I'm sorry if this caused confusion.

jungsdao commented 4 months ago

I'm trying to write pytest for neb.py but I'm dealing with strange pickle error. Should I just anyhow push it eventhough it doesn't work?

bernstei commented 4 months ago

Might as well, yes. It can be tricky to run pytests of wfl, because it's in some ways so dependent on the rest of your system, but if you at least have the exact syntax you're trying to call it'll be helpful, and we can turn it into a proper pytest. Just be sure to put any input files you need in tests/assets/

bernstei commented 4 months ago

I'm sorry if this caused confusion.

Well, it did throw me off momentarily, but at least it led me to notice that we don't support globs despite the fact that the docs say we do, so at least I was able to fix that pretty easily.

jungsdao commented 4 months ago

I tried to generate structure within pytest, rather than adding structure. Because I want to use EMT calculator and structures that works with this potential. It would be better to have another NEB trajectory to test autoparallelization. I had some pickle problem with calculator, but I'm not sure if this will happen in other environment as well

bernstei commented 4 months ago

BTW, the pickle error is because ASE changed the internals of the EMT calculator so it can't be pickled. You have to pass it to the wrapper as (EMT, [], {}). You also have to remove the calculator from initial and final.

bernstei commented 4 months ago

I haven't looked in any detail as to what exactly it's doing, but it passes for me with the attached patch neb.patch

bernstei commented 4 months ago

Now looking at the second test you added, and why it's failing.

bernstei commented 4 months ago

It turns out to be quite s subtle bug. I'm thinking about how to fix it.

bernstei commented 4 months ago

I have it working if you don't try to save the results to multiple files. I'm thinking about how to do that, but it's not trivial. For now, add this patch to your branch and see if that works for you.

neb_autopara.patch

bernstei commented 4 months ago

This patch (a superset of the previous one, apply it on a clean copy of your repo's commit 493a69e0) fixes multiple files as well. Once again, I think I need to clean up the whole nested structure business, but it seems to work here, surprisingly.

neb_autopara_mult_files.patch

jungsdao commented 4 months ago

Thanks for the patch. I just have applied the patch and it seems to work.

bernstei commented 4 months ago

Great. I'll commit now that all the tests pass.

jungsdao commented 4 months ago

nice, thank you for great help!

libAtoms / workflow

NEB implementation #296