openforcefield / openff-interchange

A project (and object) for storing, manipulating, and converting molecular mechanics data.
https://docs.openforcefield.org/projects/interchange
MIT License
71 stars 22 forks source link

Created OpenMM Topology and System virtual site atom indices don't match #1049

Open IAlibay opened 2 weeks ago

IAlibay commented 2 weeks ago

Expectation

The atom indices of an OpenMM Topology match those of its associated System, this includes virtual sites.

Current status

Creating an OpenMM Topology with virtual sites puts the virtual site at the end of each residue whilst the System has them completely at the end of the entire System. This means that if you have more than one residue, the atom indices don't match.

Reproduction

from openff.toolkit import Molecule, Topology, ForceField
from openff.interchange.components._packmol import solvate_topology
from openff.interchange import Interchange

ligand = Molecule.from_smiles('CCCCC')
ligand.generate_conformers()

off_top = Topology.from_molecules(ligand)
solvated_off_top = solvate_topology(topology=off_top)

ff = ForceField('openff-2.2.0.offxml', 'opc.offxml')
inter = Interchange.from_smirnoff(topology=solvated_off_top, force_field=ff)
omm_top = inter.to_openmm_topology()
omm_sys = inter.to_openmm_system()

print(list(omm_top.atoms())[20].name == 'EP')
print(omm_sys.isVirtualSite(20) == True)

Output

True
False

Software versions

Interchange v0.3.27

IAlibay commented 2 weeks ago

The "ideal scenario" ask

If we could change behaviour to not have virtual sites being added at the end of the system, but rather at the end of each residue (edit: read here as continuous molecule - I know that would be a pain for polymers otherwise), that would save a lot of headaches downstream.

Particularly, this behaviour doesn't match OpenMM's default behaviour (i.e. what happens with Modeller) when it comes to virtual sites.

What this means is that I can be presented with a pair of OpenMM System and Topology and not know if I should expect the indices to match or if I need to shift all the indices because the virtual sites got shuffled around. So I have to build shims everywhere to handle OpenFF specific behaviour.

The alternative asks

  1. Warn users on OpenMM Topology creation that the indices won't match
  2. Document this issue / behaviour
  3. Provide tooling to map the index shift (i.e. "topology_to_system_index(i) -> shifted_i")
    • I could do this myself downstream, but it's messy and prone to error if something changes in Interchange, it'd be a lot cleaner / safer if it was done here.
mattwthompson commented 2 weeks ago

Virtual sites being added to the end of the system is intentional[^1], them not matching up with the topology is not. The ordering in the topology derives from toolkit behavior, although it's never properly supported virtual sites. I'm going to look into updating the topology output to match the system's ordering, alternatively with some options to pick between them. The internals already have some toggles here (by comparison, GROMACS prefers virtual sites at the end of each molecule - not residue) but this must not be exposed at the OpenMM export.

I can't commit to changing the default behavior, but these indices lining up seems like a completely reasonable user expectation. These details need to be better documented as well.

I notice that this particular behavior (matching up, but all virtual sites at end) isn't your ideal scenario nor is it an alternative ask - is this something that's workable, or would you still need to build a shim?

[^1]: I forget why, but changing this is something I'd like to avoid. OpenMM itself works with just about any particle bookkeepping

IAlibay commented 2 weeks ago

I notice that this particular behavior (matching up, but all virtual sites at end)

If the OpenMM Topology matches the OpenMM System, then that works for us. That possibility didn't cross my mind since I just couldn't work out "how" (since OpenMM Topologies need contiguous indices for each residue).

mattwthompson commented 2 weeks ago

OpenMM Topologies need contiguous indices for each residue

Sadly, I'm running into this now. The stopgap solution probably will have a weak spot with residue accounting, though I'm hoping the rest of the particle indexing will make sense or at least be self-consistent.

IAlibay commented 2 weeks ago

The stopgap solution probably will have a weak spot with residue accounting

I'll have to think about it a bit more, but I suspect that for us fixing indexing at the risk of residues being split kinda breaks things in different ways 😅

mattwthompson commented 2 weeks ago

I get that - I'm not sure there'll be a perfect solution here. The major constraints I'm working with are

I have a swing at this in draft at #1051, from where I'll continue to add in some tests and iterate. Documentation probably won't be added until something crystalizes out.

[^1]: Nor should it, arguably