openforcefield / openff-interchange

A project (and object) for storing, manipulating, and converting molecular mechanics data.
https://docs.openforcefield.org/projects/interchange
MIT License
71 stars 23 forks source link

`to_openmm_positions` includes virutal sites but `Interchange.positions.to_openmm()` does not #1054

Closed IAlibay closed 1 month ago

IAlibay commented 1 month ago

Expectation

  1. Positions obtained from an interchange object should include all the particles in the system, including virtual sites.
  2. Various API points for getting positions should all lead to the same outcomes

Description

  1. Calling Interchange.positions.to_openmm() yields positions without virtual sites.
  2. Calling to_openmm_positions yields positions with virtual sites.

Reproduction

from openff.toolkit import Molecule, Topology, ForceField
from openff.interchange.components._packmol import solvate_topology
from openff.interchange.interop.openmm import to_openmm_positions
from openff.interchange import Interchange

water = Molecule.from_smiles('O')
water.assign_partial_charges(partial_charge_method='gasteiger')

ligand = Molecule.from_smiles('CCCCC')
ligand.generate_conformers()

off_top = Topology.from_molecules(ligand)
solvated_off_top = solvate_topology(topology=off_top)

ff = ForceField('openff-2.2.0.offxml', 'opc.offxml')
inter = Interchange.from_smirnoff(topology=solvated_off_top, force_field=ff)
print("Number of particles with .positions: ", len(inter.positions.to_openmm()))
print("Number of particles with to_openmm_positions: ", len(to_openmm_positions(inter)))

Output

Number of particles with .positions:  2035
Number of particles with to_openmm_positions:  2707

Software versions

Interchange v0.3.29

mattwthompson commented 1 month ago

Hmm, yeah that's a bit of a surprises. Each of these things have some reason to behave the way they do but the inconsistency isn't great.

Positions obtained from an interchange object should include all the particles in the system, including virtual sites.

Unfortunately, this is the heart of the matter. This is a consequence of the first-class view mirroring the toolkit's behavior of not knowing about virtual sites but other API points knowing about them. Right now, Interchange.topology can't store virtual sites[^1], and I'm not sure how I feel about Interchange.positions having a different shape than the number of atoms.[^2]

Various API points for getting positions should all lead to the same outcomes

Lots of API points have arguments like include_virtual_sites wired through them since there are some use cases that only want the atomic particles; I'm not sure how this would implemented on something that's a pydantic.Field, but maybe there's a way to wrap something up that handles both cases?

At very least we should document all of this better, but I'm not sure how best to do that since this touches so many parts of the codebase.

[^1]: This is because it's a Topology object from the toolkit, and its representation of molecules does not store virtual sites. I actually advocated for this change, since molecules don't have virtual sites and I'm not sure how one would usefully describe virtual sites in the absence of a force field. [^2]: In the past, we've talked about the need for a "post-parametrization" representation of a topology. Including virtual sites, which are stored elsewhere in exporters and in pretty fly-by-night ways, is a key use case that would be enabled by this change. I think some of the particle accounting would also be made easier, both internally and for downstream users.

IAlibay commented 1 month ago

At very least we should document all of this better,

To me this sounds like a good starting solution. I think there's a bit of inconsistency in the docs (or maybe I'm looking at the wrong code examples), so if everything encourages to_openmm_positions, then it shouldn't be as easy for folks to fall into this trap.

mattwthompson commented 1 month ago

Wow, I see I interpreted this originally to be about the difference between .to_openmm_positions and .topology.positions.to_openmm() (even though the API is slightly different) but you were actually getting caught on .positions.to_openmm(). I'm punting on really fixing that since, as mentioned above, just documenting things is a better short-term solution.

In the future, we ought to have some sort of InterchangeTopology subclass that papers over these issues, and maybe something similar with the positions attribute.

In any case, I've changed the default behavior of the OpenMM positions getter to exclude virtual sites by default - arguably what it should have always done - to avoid this particular API inconsistency. I agree that including virtual sites by default is the better of the two options, but flipping that switch requires more internal refactors that I'm unlikely to be able to get to soon.

mattwthompson commented 1 month ago

Calling this resolved (for now) with #1074, but see above for ideas of future work