libAtoms / workflow

python workflow toolkit
GNU General Public License v2.0
29 stars 18 forks source link

LAMMPs interface/ example #244

Open Felixrccs opened 1 year ago

Felixrccs commented 1 year ago

My colleges and I currently use LAMMPs to generate new data within our workflows. The main advantage is the flexibility and performance of LAMMPs compared to ASE. Thus I thought it would be nice to have this included in some way or form in the workflow package and I am planning to contribute this.

There are three ways to include this:

  1. The ase-lammps interface
  2. The python-lammps interface
  3. Use python to write lammps files and then run the lammps executable as a python subprocess

I tried all three approaches. Number 1 and 2 are complex to set up and run. Furthermore, they limit the flexibilty of what you can do with LAMMPs. I ended up using Version 3 for myself: I have a python script, that reads in a .xyz or .traj file, writes the needed lammps.data and lammps.in files. LAMMPs is then called in a subprocess. After the complition of the simulation the script transformes the LAMMPS output back into a OutputSpec.

I could image it either being added as a couple of basic LAMMPs simulations (NVE, NVT, NPT) as part of the main package. Or alteratively one could add one case into the example folder.

Before I start working on this I would like your opinion on this matter.

bernstei commented 1 year ago

Can you explain why you want this functionality in wfl specifically, as opposed to being a generic ASE-LAMMPS interface issue? Do you want to be able to run multiple such trajectories side by side, either with multithreaded or separate job parallelization? I feel like there needs to be first step, entirely separate from wfl, that just defines what your wrapper does, and then we can figure out how to best give it wfl functionality, depending on what kind of use-case we're envisioning.

More in general, I guess we could think about adding to or reworking the wfl.generate.md into a more general form, where you can stick in different propagators, and all will be wrapped in a standard way (conceptually like the way we wrap generic ASE calculators with wfl.calculators.generic, but of course the details will be entirely different).

bernstei commented 1 year ago

Note: my goal with the existing MD wrapper was just to get sensible configurations for fitting, and in fact to hide the ASE trajectory dynamics propagator python syntax, which I don't love. If we want "real" MD functionality, i.e. situations where you want a physical trajectory and don't just want something that gives reasonable samples for fitting, we should think about that in more detail.

Felixrccs commented 1 year ago

I am using the LAMMPs replica exchange (parallel tempering), that runs mutliple MDs of the same systeme at different temperatures and swaps between the temperatures according to a monte-carlo scheme. This is as far as I know not possible over the ASE-LAMMPS interface because it struggles with the proccessor partitioning.

In the end the majority of the wfl users in our group switched for the data generation to LAMMPs, because LAMMPS had a paricular feature that is very effecient in exploring the PES for their ML application. In our case everybody wrote a custom python script that does their LAMMPs simulation automated, however all of them follow the same general pattern.

Furthermore I am not a big fan of writing/using general python-LAMMPs interfaces/wrappers because they a prone to break every few updates and restrict the functionalites you can use in LAMMPs as well. In my opinion it would be more a simple script (an example) that shows how to run LAMMPs automated instead of a full on wrapper. And in the optimal case you would also be able to use expyre to run this remote.

In the case, that you run into the probleme of standard wfl/ASE functionalities not being sufficient for your paticual systeme, you already have a how-to-include-LAMMPS blueprint (that you can easaly adjust) instead of starting from scratch.

bernstei commented 1 year ago

I agree that this use sounds like it would be pretty messy to implement via the ASE wrapper - that's really more when you want to use LAMMPS as a simple calculator. And while your example sounds useful and I think it'd be great if you shared it in some way, I'm still not sure that wfl is the right place. Is your goal to use any wfl-specific features like the autoparallelization or remote jobs?

gabor1 commented 1 year ago

I quite like the idea of being able to use lammps to generate things as part of a wfl workflow. I am thinking of the "ase->files->lammps-> ase" route as a new kind of calculator, similar to the dft ase calculators that operate via files. we need more sophisticated ways of indicating what the calculator returns (e.g. just one frame (last one?) from a lammps run, or a more general subset of the configs visited returned as an list of configs. I know this is not yet a well defined thing, I'm just thinking aloud.

bernstei commented 1 year ago

There's already a file-based LAMMPS calculator, if what you want is just a calculator (i.e. energy, forces, stresses), and the library-based calculator is probably better for this anyway. If you want to do arbitrary LAMMPS runs, with all the power of sampling, mpirun, input file ifs and loops ,etc, you'll have to write your own lammps input file manually, since we won't want to constrain what's possible to run. The only added value from putting this in wfl that I see so far is if we add autoparallelize and running in separate directories (like the DFT file calculators). Otherwise it's trivial to write from ase to a lammps-compatible format and read from a lammps-compatible format as long as you know the filenames and pick the correct formats. We could encapsulate that, but you'd still need to write your input file (which this interface cannot really make much easier) to ensure the filenames and formats (and writing interval, e.g.) are correct, depending entirely on the user for the lammps input file syntax to achieve that.

It definitely wouldn't hurt, but there's just not that much it can do for you, except for the possibility of autoparallelization (probably mostly used for remote jobs).

gabor1 commented 1 year ago

yes of course, I agree that the lammps input file would entirely be up to the user.

I see the added value in keeping the rest of a workflow that uses wfl the same as one switches from something simple to the lammps sampler.

bernstei commented 1 year ago

I see the added value in keeping the rest of a workflow that uses wfl the same as one switches from something simple to the lammps sampler.

But does it need the workflow as such, or is it just a case for a python+ASE convenience routine?

gabor1 commented 1 year ago

yes, that is true. Felix should think about whether there is any advantage of this wrapper being in workflow. we could create a contrib/ subdirectory that can contain stuff that does not belong in the wfl package but clearly useful for more than one user.

Felixrccs commented 1 year ago

Yes, I think a contrib/ subdirectory is a great place to put the LAMMPs use-case. Then I'll start tidying up my current LAMMPs run routine. From there on, we can discuss if we should further generalize it and how to include autoparallelization or get it to work with expyre.