pyiron / pyiron_atomistics

pyiron_atomistics - an integrated development environment (IDE) for atomistic simulation in computational materials science.
https://pyiron-atomistics.readthedocs.io
BSD 3-Clause "New" or "Revised" License
44 stars 15 forks source link

LammpsBase can't handle fluctuating particle numbers #941

Open hejamu opened 1 year ago

hejamu commented 1 year ago

Summary

We were working on a pyiron based analysis using LAMMPS GCMC. We basically just sub-classed the Lammps class and adapted it a bit. (Adding the fix basically)

Everything works perfectly, except for reading the dump file into the database. Since the particle number is fluctuating, stuff does not fit neatly in a numpy array. It breaks here because v is a list op np.arrays without a common shape: https://github.com/pyiron/pyiron_atomistics/blob/fdbd00ca82b63f4cbd51a1fd65c090cfa8b826b2/pyiron_atomistics/lammps/base.py#L1208

pyiron Version and Platform

Installed via conda and from source

pyiron                        0.4.7
pyiron-atomistics             0.2.62.post0.dev6 /Users/henrik/repos/pyiron_atomistics
pyiron-base                   0.5.31
pyiron-contrib                0.1.6.post0.dev60 /Users/henrik/repos/pyiron_contrib

Expected Behavior

Since Lammps dumps support fluctuating particle numbers, so should pyiron.

Actual Behavior

It does not.

Steps to Reproduce

Read in a LAMMPS dump file with a non-constant number of particles via the collect_dump_file method.

jan-janssen commented 1 year ago

This is a tricky one, as the current LAMMPS class is simply not designed to do these things. Still we have already addressed the same issue for the interactive LAMMPS jobs, by implementing a interactive_index_organizer(): https://github.com/pyiron/pyiron_atomistics/blob/main/pyiron_atomistics/atomistics/job/interactive.py#L145

Here we store the positions of all structures in one big array [[x, y, z], ...] and then in addition store an index the first structure goes from position 0 to n and the second structure from n+1 to m and so on.

So the general logic is there, but for your specific case it is most likely faster implementing a specific solution, rather than a general one. You can either try it on you own, of you can share a quick example of your current code here and then one of us can take a look. Once we have the example I guess it should be possible to created a modified LAMMPS class which uses the storage from the interactive LAMMPS job.

liamhuber commented 1 year ago

Additionally, we do support semi-GC with calc_vcsgc, which exploits the native VC-SGC fix in Lammps.

I know it's fundamentally different physics than GC so probably not helpful here, but I'll mention it just on the long shot that it's helpful and news.

hejamu commented 1 year ago

@jan-janssen thanks for the swift reply. It is indeed easier for us to put together a "quick and dirty" solution for our use case, since in this particular case we only need the histogram of the particle number. I'm still wondering what the best way to deal with this would be, I would love to use pyiron for our MD+GCMC simulations since we deal with a lot of simulation runs.

Would it be an option to use the H5MD format to save the trajectory related data? It supports changing particle numbers and seems like the native format in this case.

@liamhuber I was actually using the calc_vcsgc method as inspiration for our code, but I'm wondering why it should not suffer the same problems...?

pmrv commented 1 year ago

We have two classes that support handling of trajectories with variable number of entries, StructureStorage and its base class FlattenedStorage. Since the parsing works, it might be enough in your case to modify the linked code to use either of these classes (or something you built on top) for the storage to HDF5. The problem is that this most likely will break a lot of analysis functionality in the Lammps class. But it might be the most straigthforward solution just to have something to try it out and the fix the technical problem later.