mosdef-hub / gmso

Flexible storage of chemical topology for molecular simulation
https://gmso.mosdef.org
MIT License
53 stars 32 forks source link

Dihedral Improper sorting not compatible with (via mosdef-GOMC) NAMD #783

Closed bc118 closed 5 months ago

bc118 commented 8 months ago

I just ran across a case where the impropers do not work in NAMD due to a sorting issue.

GMSO only saves 1 sorted version of the improper C-X-Y-Z (atom class/type - where C is the center atoms and X, Y, and Z are variable ordered). This seems fine for printing the FF file (.inp) in mosdef-GOMC.

However, the issue seems to be that the atom ordering (used in PSF file to print atom numbers in the improper) is not always the same order as the 1 saved C-X-Y-Z (atom/type class) version. Since NAMD is apparently not checking the possible order combinations and going straight off of the PSF file vs FF (.inp file), this is causing an issue.

Is it possible to sort the improper atom order to align with the 1 selected C-X-Y-Z version (atom/type class) version saved in GMSO?

maybe GMSO can sort the atom order and atom class/type for atoms X, Y, Z (atoms 2, 3 , and 4) in alphabetical order or something… so all the same impropers of the same type have the same atom numbers and atom class/type ordering. This would fix all potential future issue on all engines also.

Note: there would be 6 possible combinations for the improper ordering.

CalCraven commented 8 months ago

Hi @bc118, thanks for raising this issue. I think this is an important point to discuss: How rigid are we going to be for sorting these things in GMSO?

Here are the relevant tradeoffs that come to mind: Being more strict about handling the sorting internal to GMSO: Pros- We can make more assumptions about the topology structure. This should make it easier to write out consistent files from the given input. It should also be easy to communicate exactly what this structure is in one place, although making people read that can be an issue. Cons- This would require lots of internal checks to validate the huge number of possible input ways a topology is generated. Every reader or converter would need to perform these checks on the inputs, and the writers would probably want to check beforehand using validation. We also risk limiting ourselves if someone wants to write out a specific way, but the internals automatically convert things each time. There's also the risk that sorting needs to be different for different engines, which wouldn't be clear. i.e. I want to write out a gromacs and lammps files, and do some data handling with them in two different analysis tools, and I'm assuming everything is generated analogously.

Looser controls: Pros- More flexibility. People can do what they want and aren't forced into a given formatting. Cons- As mentioned, can lead to incompatibilities if one engine does it one way, and another does it a second way. If something is generated with some randomization and then read into GMSO, GMSO would parse it differently based on the input.

CalCraven commented 8 months ago

Can you put an example of the two incompatible files? The PSF file is not part of native GMSO, so this issue actually may be on the side of MoSDeF-GOMC as well.
Here is where it's generated I think. https://github.com/GOMC-WSU/MoSDeF-GOMC/blob/5eb8f38cdb08ed1b6bf3c99766259da987eeec88/mosdef_gomc/formats/charmm_writer.py#L3264

For instance, the LAMMPS writer does it's own sorting as LAMMPS users have unique preferences: https://github.com/mosdef-hub/gmso/blob/b728c999b91f6c682d73183f1165abeb58f57097/gmso/formats/lammpsdata.py#L918

bc118 commented 8 months ago

I can not post the example here on the internet, but I directly send the example to you.

CalCraven commented 5 months ago

Closed with #796