openforcefield / openff-interchange

A project (and object) for storing, manipulating, and converting molecular mechanics data.
https://docs.openforcefield.org/projects/interchange
MIT License
71 stars 23 forks source link

Design: Safeguard against clashing "atom type" identifiers from different sources #314

Closed mattwthompson closed 3 years ago

mattwthompson commented 3 years ago

A common ParmEd failure mechanism involves two different components with atom types that share the same name, but have different parameters. This is extremely common when interfacing OpenMM and Parmed, i.e.

protein_system = omm_forcefield.createSystem(...)

protein = pmd.openmm.load_topology(...)
ligand = pmd.openmm.load_topology(...)

complex = protein + ligand

This can easily result in conflicts, like below in which each the protein and ligand independently have atom types named H3, which have different parameters, and ParmEd mangles them when writing out to a GROMACS file:

In [27]: {a.sigma for a in pmd_ligand_struct.atoms if a.type == 'H3'}
Out[27]: {2.5996424595335106}

In [28]: {a.sigma for a in pmd_receptor_struct.atoms if a.type == 'H3'}
Out[28]: {2.471353044121301}

In [29]: {a.sigma for a in pmd_complex_struct.atoms if a.type == 'H3'}
Out[29]: {2.471353044121301, 2.5996424595335106}

In [30]: !grep H3 complex.top | head -n 1
H3             1   1.007947  0.00000000  A     0.25996425        0.06276

The current design of PotentialKey isn't immune to this; we should consider adding in extra safeguards. One solution involves each "atom type" also knowing which "thing" it came from; commonly this would mean that one H3 identified as LIGH3 and the other is PROTEINH3 or something. I'm not quite sure how that maps on to TopologyKey/PotentialKey, it's not natural to me that there should be a PotentialKey._topology_source attribute; maybe something with mult can be coerced into doing this.

mattwthompson commented 3 years ago

The original issue https://github.com/ParmEd/ParmEd/issues/1197 is a case of adding a ligand to a protein with water and ions. @j-wags wisely speculated that it would probably also happen with nothing more than two ligands. This turned out to be an exceptionally concise MWE, demonstrating with only small molecules that this happens with both ParmEd and Interchange, in its current state, without needing to read any files.

In:

import os

from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField
from parmed import Structure
from parmed.openmm import load_topology

from openff.interchange.components.interchange import Interchange

thf = Molecule.from_iupac("tetrahydrofuran")
ace = Molecule.from_smiles("CC(=O)O")

thf.generate_conformers(n_conformers=1)
ace.generate_conformers(n_conformers=1)

def make_structure(molecule: Molecule) -> Structure:

    sage = ForceField("openff_unconstrained-2.0.0.offxml")

    molecule.generate_conformers(n_conformers=1)
    topology = molecule.to_topology()
    system = sage.create_openmm_system(topology)
    return load_topology(topology.to_openmm(), system, molecule.conformers[0])

def make_interchange(molecule: Molecule) -> Interchange:
    sage = ForceField("openff_unconstrained-2.0.0.offxml")

    molecule.generate_conformers(n_conformers=1)
    interchange = Interchange.from_smirnoff(
        force_field=sage, topology=molecule.to_topology()
    )
    interchange.positions = molecule.conformers[0]

    return interchange

thf_structure = make_structure(thf)
ace_structure = make_structure(ace)

complex_structure = thf_structure + ace_structure

thf_structure.save("thf_parmed.top", overwrite=True)
ace_structure.save("ace_parmed.top", overwrite=True)
complex_structure.save("add_parmed.top", overwrite=True)

os.system("grep -r '^O1' thf_parmed.top")
os.system("grep -r '^O1' ace_parmed.top")
os.system("grep -r '^O1' add_parmed.top")

thf_interchange = make_interchange(thf)
ace_interchange = make_interchange(ace)
complex_interchange = thf_interchange + ace_interchange

thf_interchange.to_top("thf_interchange.top")
ace_interchange.to_top("ace_interchange.top")
complex_interchange.to_top("add_interchange.top")

os.system("grep -r '^O1' thf_interchange.top")
os.system("grep -r '^O1' ace_interchange.top")
os.system("grep -r '^O1' add_interchange.top")

Out:


$ python ligand_combination.py
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
thf_parmed.top:O1             8  15.999430  0.00000000  A     0.30251065     0.70485815
ace_parmed.top:O1             8  15.999430  0.00000000  A     0.30398122     0.87950233
add_parmed.top:O1             8  15.999430  0.00000000  A     0.30398122     0.87950233
/Users/mwt/software/openff-system/openff/interchange/components/interchange.py:483: UserWarning: Iterchange object combination is experimental and likely to produce strange results. Use with caution!
  warnings.warn(
thf_interchange.top:O1          XX     15.99943 0.0000000000000000 A     0.3025106490435312 0.7048581468486769
ace_interchange.top:O1          XX     15.99943 0.0000000000000000 A     0.3039812205065809 0.8795023257036865
add_interchange.top:O1          XX     15.99943 0.0000000000000000 A     0.3025106490435312 0.7048581468486769
mattwthompson commented 3 years ago

Soft blocked by how topologies are added together, which is either not implemented with intentionality or something I'm not super familiar with.