Pyiron generated VASP inputs: Option to preserve atomic ordering

pyiron / pyiron_atomistics

pyiron_atomistics - an integrated development environment (IDE) for atomistic simulation in computational materials science.

https://pyiron-atomistics.readthedocs.io

BSD 3-Clause "New" or "Revised" License

44 stars 15 forks source link

Pyiron generated VASP inputs: Option to preserve atomic ordering #752

Open ligerzero-ai opened 2 years ago

ligerzero-ai commented 2 years ago

Hi, I am just beginning to code some proof-of-concept workflows. There are a couple issues that immediately cropped up:

It is highly desirable for complex workflows that the POSCARs generated have the option to have their atomic ordering preserved, instead of rearranged.

e.g. Fe 36 V 1 Fe 2 Al 1 Fe 31 remains that way and doesn't become Fe 69 V 1 Al 1

This is because the ordering of the atoms is important for complex derived workflows (e.g. interaction energy in co-segregation needs to be calculated by replacing a specific atom with another element, this needs to be considered with the exact same atom in a single-segregation case in the same position).

As it stands, pyiron will rearrange the atomic ordering for single-segregation cases and this will make subsequent co-segregation workflows non-trivial, when they should be trivial. Data such as movement from original position, etc. which can be important are highly non-trivial to compare between "like" systems if you rearrange the structures on each pyiron job generation. Comparing atoms to determine which is the "same" from the original position becomes non-trivial compared to just using the "dumb" method and relying on atomic ordering.

Then, immediately I would see that POTCAR generation, MAGMOM flag generation becomes very different too. Could someone point me to the right places so I can implement this?

liamhuber commented 2 years ago

It is highly desirable for complex workflows that the POSCARs generated have the option to have their atomic ordering preserved,

I'm not convinced that the VASP files themselves need to preserve order, although I agree it's absolutely critical that, e.g., job.structure.positions and job.output.positions[-1] maintain exactly the same ordering. If the VASP job internally re-orders these for the VASP run and then converts them back to the "true" pyiron order in the output, I don't see a problem with that.

As it stands, pyiron will rearrange the atomic ordering for single-segregation cases and this will make subsequent co-segregation workflows non-trivial, when they should be trivial.

I thought this order-preservation was already the current behaviour, but I have to confess I'm not a regular VASP user so I might be very wrong about that. If your claim here is correct, then that is a straight up bug in our pyiron-VASP interface.

e.g. Fe 36 V 1 Fe 2 Al 1 Fe 31 remains that way and doesn't become Fe 69 V 1 Al 1

I think POTCAR generation is exactly why the atoms are internally re-ordered for VASP -- isn't the POTCAR just the concatenation of POTCAR files for each species in the POSCAR atom count list? Then leaving it as Fe 36 V 1 Fe 2 Al 1 Fe 31 would be very storage-wasteful as we'd get each of the POTCARs multiple times in our aggregated POTCAR.

This is because the ordering of the atoms is important for complex derived workflows

In general, I don't see a problem with pyiron and VASP having different orderings. Structures, energies, magnetic moments, etc. should all be consistent between pyiron job input and output, but in general we ought not need to worry about what a particular code does to those properties under the hood -- as long as we "undo" any such changes when parsing output.

When writing workflows I strongly encourage you to stay on the pyiron side of the line, and leave internal VASP (or other code) implementation alone as much as possible. Sometimes you won't be able to do this and still get your workflow running -- when this situation comes up, I feel the correct solution is to go into the pyiron interface for VASP and expose all the things you need at the pyiron level, then go back to your workflow and keep writing.

...this will make subsequent co-segregation workflows non-trivial, when they should be trivial.

At least in Lammps my experience is that co-segregation workflows are, indeed, trivial. If there is a bug in the VASP job that ordering in the input is not preserved to the output, I agree this would make such workflows a huge pain.

Example) Variance Constrained Semi Grand Canonical simualtions in Lammps. I wanted to use this Lammps fix in a workflow. Instead of having the workflow mess around with the raw Lammps input files, I went to the Lammps job object and exposed a new calc_vcsgc method right there which handles all the interaction with the Lammps IO files. Then when I'm doing my physics (i.e. workflow) I stay at the abstracted pyiron level.

ligerzero-ai commented 2 years ago

Hi, thanks for clarifying that input and output are indeed "undone" and order is preserved when the output is accessed through the pyiron project. I have checked the input and output accordingly and confirmed that this is the case.

However, I note that issues such as #700 #718 make this somewhat extremely dangerous, especially if errors such as species assignment are still running around in the code, which POTCAR/POSCAR generation may be affected by. While fixing such issues may remove the error, we ask the question if this additional convolution which can introduce further errors is even necessary.

I think that the notion of conserving a couple hundred megabytes of storage, at worst a couple GBs via re-ordering to avoid writing of POTCARs and subjecting yourself to the vulnerability of such bugs when doing high-throughput studies extremely penny-wise pound-foolish.

I will concede that for extremely complex structures such as SQS with many component elements, it is desirable behaviour to re-order structures to minimise the size of the POTCAR. But in most cases, this is not necessary, as the filesize is trivial compared to CHG, CHGCAR, WAVECAR etc.

I think that the option to have more fine-tuned control over the way calculations are conducted are not necessarily a detriment, but actually highly desirable for "selling" pyiron to new-users which have already implemented their own workflows which they can easily transplant to pyiron, which they may see value in for post-processing, not necessarily job-creation/setup, which they would want to be copied 1-to-1, and not be at the mercy of random bugs which they may not be aware of.

I can foresee many users that would be put-off forever if they find that pyiron messed with their workflows and wasted/consumed their scarce computing resources because they trusted the code to do the right thing, but find that actually because of this unnecessary fancy convolution, it introduced unnecessary errors which broke what should have been a previously implemented completely-fine workflow.

Before anything, the code package must do its job properly, robustly and without errors - fancy convolutions or abstractions that can introduce convoluting errors should be able to be disabled, or even remain off by default, with the option to be turned on. This is the "first do no harm" before additional value is created, is a philosophy which I think is really important for convincing users to adopt pyiron.

Researchers already have to deal with enough crap, without research software that should be making their jobs easier becoming another headache - my 2c

Sorry for the long response - but that is how I feel about this.

liamhuber commented 2 years ago

However, I note that issues such as https://github.com/pyiron/pyiron_atomistics/pull/700 https://github.com/pyiron/pyiron_atomistics/issues/718 make this somewhat extremely dangerous, especially if errors such as species assignment are still running around in the code, which POTCAR/POSCAR generation may be affected by.

As far as I can tell, these are not VASP/POSCAR errors, but really fundamental bugs with Atoms (and I suppose bugs that ASE shares with us).

I think that the notion of conserving a couple hundred megabytes of storage, at worst a couple GBs...

I will concede that for extremely complex structures such as SQS with many component elements

If we start thinking about many users, or one user doing some sort of SQS high throughput....I am really not convinced this stays merely at the GB scale. I don't have VASP on my system -- how big are the single-species POTCAR files anyhow?

...and not be at the mercy of random bugs which they may not be aware of.

As you state, there is no bug. VASP is correctly re-mapping output back to the original order. This is not a bug, but an important storage-preserving feature. I'm comfortable giving any user who has the confidence to dig in and work with wrapped code IO files directly the responsibility to understand the mapping. If a "POSCAR" format file is needed for some particular analysis tool (e.g. VESTA), one can just use Atoms. on either the before or after structure (which are consistent because we use the map going both into and out of VASP-space).

I can foresee many users that would be put-off forever if they find that pyiron messed with their workflows...

Before anything, the code package must do its job properly, robustly and without errors...

Sorry for the long response - but that is how I feel about this.

Absolutely no need to apologize; it was clearly written and each paragraph holds unique data. However, I still totally disagree. IMO one of the big selling points of pyiron is that you don't need to deal with the individual simulation codes. There are exceptions to this (e.g. my VCSGC example), but in these cases I strongly believe that the solution is to extend the pyiron code base to get back to the point of not needing to dig into the wrapped codes but rather to keep everything at the pyiron/python level -- that is how we ensure that these exceptional cases are easily used by the rest of the community. As far as I can see, in this case pyiron is not messing with workflows or failing to do it's job properly -- we re-map atom order going into VASP-space, and re-map it going back out. I see exposing additional complexities to pyiron users as unnecessary, ala @samwaseda's comment:

Hmmm but that sounds like something that shouldn't be needed in the first place in pyiron. Frankly, rather than cluttering the job-level entries with even more functions that shouldn't mean anything to anyone, I'd offer the same solution as the one you mentioned in the issue.

If VASP has output that is useful, which we don't expose in pyiron, then IMO the answer is to expose it pyironically, not to send users digging into wrapped-code IO files. Once it's exposed in pyiron the process of transplanting workflows should indeed be easy. If we do it this way, everyone benefits from the new workflow.

liamhuber commented 2 years ago

After digging into it over in #492, I just want to put the answer here so it's linked together with the issue for posterity.

It is highly desirable for complex workflows that the POSCARs generated have the option to have their atomic ordering preserved

As it stands, pyiron will rearrange the atomic ordering

Indeed, this happens in pyiron_atomistics.vasp.structure.write_structure; to get the original ordering you can always directly write either the input or output structures in POSCAR format while preserving the order:

some_structure.write(filename='original_order_POSCAR', format='vasp')
# OR
some_structure.write(filename='original_order_POSCAR_direct_coords', format='vasp', direct=True)

ligerzero-ai commented 2 years ago

Following our Monday pyiron meeting 12/09/2022, we agreed that it should be appropriate for users to specify that they need the specific user-generated ordering to be able to be directly inputted in the calculations, so as to avoid convoluting ordering messes happening in the OUTCAR, since not all data in it is parsed.

The to-do list:

Put a warning in the raw-POSCAR data input that the default (currently silent!) behaviour of pyiron is to re-order the atomic arrangement to minimise the written POTCAR sizing. This immediately tells users what they need to do to generate the user-fed structure exactly.
There was some talk about having a warning file in the jobfolder?
To include a boolean input in the vasp job generation with name something like "preserve_atomic_ordering/retain_atomic_ordering = False" in the VASP input file generation
If there is re-ordering, there needs to be an atom-to-atom mapping of the original ordering vs pyiron generated structures present in the jobfolder so users may write their own specific parsers that access the raw data of the OUTCAR directly.

@liamhuber @samwaseda @pmrv Could you point me to the right place to implement this? Sorry - I have really poor code-reading abilities as a scientist that knows a little bit of code as opposed to professional coder :)

With a project as big as pyiron, it will just be faster to tell me what needs to be done and link me it so I can go ahead and write the code necessary. I can pick up the pieces as I go, and become more familiar as I go around actually writing the code.

ligerzero-ai commented 2 years ago

I think a good solution to the problem of mapping is simply provide the user-input POSCAR like POSCAR-user and CONTCAR-user written to the folder at the end of the job. That way the users know exactly what file they need at first glance instead of having to actually enter the files. This preserves the POTCAR efficiency, but elegantly solves some of the major issues.

This way, the user-defined structures are available immediately for download and visualisation in 3rd-party programs, but also serves as an automatic way to implicitly warn the user who is looking at the raw input that pyiron has messed with the VASP inputs.

samwaseda commented 2 years ago

Put a warning in the raw-POSCAR data input that the default (currently silent!) behaviour of pyiron is to re-order the atomic arrangement to minimise the written POTCAR sizing. This immediately tells users what they need to do to generate the user-fed structure exactly.

Here: https://github.com/pyiron/pyiron_atomistics/blob/c218487cdf851a27216b7d20e352bb46b46cc75a/pyiron_atomistics/vasp/structure.py#L96

samwaseda commented 2 years ago

And the warning is essentially the logger, which you can probably import via:

from pyiron_base import state

and use it by:

state.logger.debug("Atom order changed")

or something like this? However, this one issues the warning in the Notebook, but from what I understood from @srmnitc, it is apparently also possible to write it in the python log file, which is stored in the job directory. I'm not really sure how it works.

samwaseda commented 2 years ago

To include a boolean input in the vasp job generation with name something like "preserve_atomic_ordering/retain_atomic_ordering = False" in the VASP input file generation

I would say this should be in the DFT generic input, probably here

samwaseda commented 2 years ago

If there is re-ordering, there needs to be an atom-to-atom mapping of the original ordering vs pyiron generated structures present in the jobfolder so users may write their own specific parsers that access the raw data of the OUTCAR directly.

I don't know what this is called in VASP. I pasted the lines for SPHInX below. This should definitely be unified between VASP and SPHInX (and we can maybe also think about moving it to DFG generic input).

https://github.com/pyiron/pyiron_atomistics/blob/c218487cdf851a27216b7d20e352bb46b46cc75a/pyiron_atomistics/sphinx/base.py#L160-L164

pmrv commented 2 years ago

And the warning is essentially the logger, which you can probably import via:
from pyiron_base import state
and use it by:
state.logger.debug("Atom order changed")
or something like this? However, this one issues the warning in the Notebook, but from what I understood from @srmnitc, it is apparently also possible to write it in the python log file, which is stored in the job directory. I'm not really sure how it works.

All jobs also have an appropriate logger instance attached, so in most cases just using job.logger. should be enough.