Standardize indexing in `connection_members`

rmatsum836 commented 4 years ago

When writing out angles and dihedrals, it will be important to follow the syntax for specific engines. For example, LAMMPS requires angles to be written out as angle-1 angle-2 angle-3 where angle-2 is the central atom (https://lammps.sandia.gov/doc/2001/data_format.html).

In connection_members we should standardize the indexing so that it's consistent and easier to abide by the syntax set by the various engines.

uppittu11 commented 4 years ago

Maybe we could also have a function that would merge duplicate connections into one. For example, an angle with members 1, 2, 3 and another angle with members 3, 2, 1.

rsdefever commented 4 years ago

we should standardize the indexing so that it's consistent and easier to abide by the syntax set by the various engines.

Agree. I also like the i-j-k order where j is the central atom.

Maybe we could also have a function that would merge duplicate connections into one. For example, an angle with members 1, 2, 3 and another angle with members 3, 2, 1.

We need to be careful here. IIRC, we want to support multiple connections being defined with the same members, i.e., layering potentials.

uppittu11 commented 4 years ago

@rsdefever That's a good point. maybe instead of merging equivalent connections, we could make sure that the connection members are sorted when writing to file to make it easier to debug.

Maybe we discussed it elsewhere, but to layer potentials, do we want to create new Connections or do we want to be able to add multiple ConnectionTypes to a single Connection?

rsdefever commented 4 years ago

Maybe we discussed it elsewhere, but to layer potentials, do we want to create new Connections or do we want to be able to add multiple ConnectionTypes to a single Connection?

Very curious others opinions here. I can think of it either way:

Multiple ConnectionTypes to a single Connection: There is only one physically meaningful connection. We are applying some model potential to that physical entity to make it behave the way we want. Sometimes we need multiple 'layered' potentials to get the physics right. This is ultimately a modelling issue and does not change the fact that there is one connection that we are trying to model appropriately. Therefore, there is only one connection even if it has multiple layered ConnectionTypes.
Multiple Connections: the Connection is the way we indicate that the application of a potential to a some set of sites. Therefore, it makes sense to keep a 1-to-1 mapping between a Connection and its associated ConnectionType.

I think I lean towards Multiple ConnectionTypes to a single Connection, but would be interested in hearing how others are thinking about this. If that's the route we go/are going, that resolves my concern.

ahy3nz commented 4 years ago

Maybe we could also have a function that would merge duplicate connections into one. For example, an angle with members 1, 2, 3 and another angle with members 3, 2, 1.

We need to be careful here. IIRC, we want to support multiple connections being defined with the same members, i.e., layering potentials.

Sorting could be nice for consistency, but for some force field terms like trefoil impropers ( source0, source1, source2, source3 ), order appears to matter, in that it looks like there's a term for all 3 permutations of writing the improper.

If this is just for angles, off the top of my head I can't think of a FF term where order matters. I think parmed ends up sorting all their connections/connectiontypes by alphabetical order while preserving centrality

Maybe we discussed it elsewhere, but to layer potentials, do we want to create new Connections or do we want to be able to add multiple ConnectionTypes to a single Connection?

Very curious others opinions here. I can think of it either way:

Multiple ConnectionTypes to a single Connection: There is only one physically meaningful connection. We are applying some model potential to that physical entity to make it behave the way we want. Sometimes we need multiple 'layered' potentials to get the physics right. This is ultimately a modelling issue and does not change the fact that there is one connection that we are trying to model appropriately. Therefore, there is only one connection even if it has multiple layered ConnectionTypes. Multiple Connections: the Connection is the way we indicate that the application of a potential to a some set of sites. Therefore, it makes sense to keep a 1-to-1 mapping between a Connection and its associated ConnectionType. I think I lean towards Multiple ConnectionTypes to a single Connection, but would be interested in hearing how others are thinking about this. If that's the route we go/are going, that resolves my concern.

My gut feeling is that treating multiple ConnectionTypes for a single Connection could be the way to go - if you just had one ConnectionType with some massive expression to it, it seems hard to dissect the massive expression into smaller, actually-implemented FF terms in a simulation engine. Keeping all the ConnectionTypes tagged to the same Connection could be good for bookkeeping and passing FF information along (like in lammps where dihedral terms can get weighted based on the numbers of dihedral terms are being applied to a particular bonded quadruplet); if you had 4 new Connections and ConnectionTypes just for the same bonded entity, it might be harder to book-keep and herd all of those 4 terms together when processing or writing out

uppittu11 commented 4 years ago

@ahy3nz That's good to keep in mind for impropers. Based on the GROMACS docs for impropers it seems like the first index (i) is the central atom, the last index (l) is the atom in the other plane, and the middle two indices (j, k) are interchangeable.

So a summary of equivalencies: Bonds: (i, j) = (j, i) Angles: (i, j, k) = (k, j, i) Dihedrals: (i, j, k, l) = (l, k, j, i) Impropers: (i, j, k, l) = (i, k, j, l)

Edit this if I'm missing anything

rmatsum836 commented 4 years ago

I've thought about this a little more and I think rather than having a "standard" index, we should ensure that the indexing of connection_members is consistent with the type definition in the ForceField. For example, if the name of the angle_type is [opls_112~opls_111~opls_112], then connection_members should be tuple(opls_112, opls_111, opls_112). That way we don't run into any issues with multiple connection types having the same members.

mattwthompson commented 4 years ago

Maybe we discussed it elsewhere, but to layer potentials, do we want to create new Connections or do we want to be able to add multiple ConnectionTypes to a single Connection?

Is ordering enforced in Connection objects? It would feel very strange to layer i-j-k and k-j-i angles on top of the same angle. If ordering is not enforced my half-though-through vote would be for allowing multiple ConnectionTypes on any one Connection.

we should ensure that the indexing of connection_members is consistent with the type definition in the ForceField.

Agree completely, the atom type is the interface between an atom and the ConnectionType built on top of that, and the order there absolutely matters. Much of the magic of the SMIRNOFF specification is that SMARTS matching is done on valence terms (what we call ConnectionTypes) where order is essential.

Just to hone in on some language here: I think a distinction should be drawn between the quesiton of if ordering matters and if symmetry also matters. In the case of water, you could get duplicate SMARTS matches to an H-O-H angle because of its symmetry, but you'd only want to keep one of the angles.

Relating it back to the original case of writing to LAMMPS data: is this getting off track? Does enforcing atom1 atom2 atom3, and making sure atom3 atom2 atom1 isn't also in the Angles section, work? Obviously the atom2 atom1 atom3 case breaks both symmetry and physically sensible ordering.

daico007 commented 2 years ago

This topic has been discussed and agreed on:

Angle: end1 - central - end2
Dihedral: end1 - mid1 - mid2 - end2
Improper: central - end1 - end2 - end3

mosdef-hub / gmso

Standardize indexing in `connection_members` #361