openforcefield / openff-interchange

A project (and object) for storing, manipulating, and converting molecular mechanics data.
https://docs.openforcefield.org/projects/interchange
MIT License
71 stars 22 forks source link

Bond information missing from water forcefields #843

Open GregorySchwing opened 10 months ago

GregorySchwing commented 10 months ago

Describe the bug I'd like to be able to write amber parameter files using interchange from openff-toolkit. However, no bond information is included in the water xmls leading to failure.

To Reproduce

    # Imports from the toolkit
    from openff.toolkit import ForceField, Molecule, Topology
    from openff.units import Quantity, unit
    from openff.interchange import Interchange

    from rdkit import Chem
    from rdkit.Chem import AllChem
    from rdkit.Chem import rdmolfiles
    def embed(mol, seed=None):
        params = AllChem.ETKDGv2()
        if seed is not None:
            params.randomSeed = seed
        else:
            params.randomSeed = 123
        AllChem.EmbedMolecule(mol, params)
        return mol

    forcefield = ForceField("tip3p-1.0.0.offxml")
    openff_mol = Molecule.from_mapped_smiles("[O:1]([H:2])[H:3]")
    rdmol = openff_mol.to_rdkit()
    rdmol3D = embed(rdmol,123)
    # Create a PDB file
    writer = rdmolfiles.MolToPDBFile(rdmol3D, "water.pdb")
    # Load the topology from a PDB file and `Molecule` objects
    topology = Topology.from_pdb(
        "water.pdb",
        unique_molecules=[openff_mol],
    )

    interchange = Interchange.from_smirnoff(
        force_field=forcefield,
        topology=topology,
    )

    interchange.to_prmtop("water.prmtop")
    interchange.to_inpcrd("water.inpcrd")

If the problem involves a specific molecule or file, please upload that as well. -->

Output

    interchange.to_prmtop("water.prmtop")
    File "/opt/conda/lib/python3.10/site-packages/openff/interchange/components/interchange.py", line 551, in to_prmtop
      to_prmtop(self, file_path)
    File "/opt/conda/lib/python3.10/site-packages/openff/interchange/interop/amber/export/_export.py", line 352, in to_prmtop
      key: i for i, key in enumerate(interchange["Bonds"].potentials)
    File "/opt/conda/lib/python3.10/site-packages/openff/interchange/components/interchange.py", line 815, in __getitem__
      raise LookupError(
  LookupError: Could not find component Bonds. This object has the following collections registered:
        ['Constraints', 'vdW', 'Electrostatics']

Computing environment (please complete the following information): openff-toolkit 0.14.4

mattwthompson commented 10 months ago

Thanks for the report, this is probably a bug in Interchange, but it might be a deeper incompatibility with the information Amber expects since it has opinions about bond graphs and physics parameters. I think ParmEd assigns arbitrary an force constant of 50,000 kcal/mol or something in cases like these, which is not ideal but maybe the best among the possible options.

The OFFXML files don't have bond information since they are rigid models.

We probably haven't encountered this yet since most use cases involve loading a general force field, which will include some sort of harmonic bond parameters to fall back on (even though, again, those are not meant to be used with these water models). Here's an example of using the TIP3P model, writing to Amber files:

In [1]: from openff.toolkit import Molecule, ForceField

In [2]: ForceField(
    "openff-2.0.0.offxml",
    "tip3p-1.0.0.offxml",
).create_interchange(
    Molecule.from_mapped_smiles(
        "[O:1]([H:2])[H:3]"
    ).to_topology(),
).to_prmtop('foo')

(In the case of TIP3P, loading it up separately is not necessary since Sage bundles it.)

GregorySchwing commented 10 months ago

I understand, but I still have issues when I do this

forcefield = ForceField("openff-2.1.0.offxml", "opc-1.0.0.offxml")

and build prmtop and incrd files

Here is an incrd directly from tleap

sh-5.0# cat opc.inpcrd 
OPC
     4
   0.0000000   0.0000000   0.0000000   0.8724330   0.0000000   0.0000000
  -0.2051460   0.8479710   0.0000000   0.0985730   0.1252640   0.0000000

Interchange

    3  0.0000000e+00
  -0.0040000   0.3560000  -0.0000000  -0.8250000  -0.1790000  -0.0000000
   0.8290000  -0.1770000   0.0000000

Perhaps related to #783.

mattwthompson commented 10 months ago

Right, but that's for a different reason. Four-site water models aren't implemented in the Amber writers yet (surprisingly, there hasn't been much interest in it yet) - there'll be much more missing than bond information, as you've found out. Amber's support for virtual sites seems to be much more limited than i.e. OpenMM, but doing four-site water models is tractable.

I'm hoping to tackle it later month - keep an eye out for releases or updates to that issue!

mattwthompson commented 10 months ago

Okay - I found the issue with the details I was thinking of: #732

GregorySchwing commented 10 months ago

Update. Going through parmed works, or appears to work


# Imports from the toolkit
from openff.toolkit import ForceField, Molecule, Topology
from openff.units import Quantity, unit
from openff.interchange import Interchange

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdmolfiles
def embed(mol, seed=None):
    params = AllChem.ETKDGv2()
    if seed is not None:
        params.randomSeed = seed
    else:
        params.randomSeed = 123
    AllChem.EmbedMolecule(mol, params)
    return mol

forcefield = ForceField("openff-2.1.0.offxml", "opc-1.0.0.offxml")
openff_mol = Molecule.from_mapped_smiles("[O:1]([H:2])[H:3]")
"""
rdmol = openff_mol.to_rdkit()
rdmol3D = embed(rdmol,123)
# Create a PDB file
writer = rdmolfiles.MolToPDBFile(rdmol3D, "water.pdb")
# Load the topology from a PDB file and `Molecule` objects
topology = Topology.from_pdb(
    "water.pdb",
    unique_molecules=[openff_mol],
)
"""
# Generate a conformer to be used as atomic coordinates
openff_mol.generate_conformers(n_conformers=1)

# Convert this molecule to a topology
topology = openff_mol.to_topology()

# Define periodicity via box vectors
topology.box_vectors = unit.Quantity([4, 4, 4], unit.nanometer)

interchange = Interchange.from_smirnoff(
    force_field=forcefield,
    topology=topology,
)

# FAILS!
#interchange.to_prmtop("water.prmtop")
#interchange.to_inpcrd("water.inpcrd")

interchange.to_gro("water.gro")
interchange.to_top("water.top")

from parmed import gromacs, amber, unit as u
gmx_top = gromacs.GromacsTopologyFile('water.top')
gmx_gro = gromacs.GromacsGroFile.parse('water.gro')
gmx_top.box = gmx_gro.box # Needed because .prmtop contains box info
gmx_top.positions = gmx_gro.positions
amb_prm = amber.AmberParm.from_structure(gmx_top)
amb_prm.write_parm("water.prmtop")
amb_inpcrd = amber.AmberAsciiRestart("water.inpcrd", mode="w")
amb_inpcrd.coordinates = gmx_top.coordinates
amb_inpcrd.box = gmx_top.box
amb_inpcrd.close()
mattwthompson commented 10 months ago

Right - this particular behavior is not implemented in the Amber writer. It's implemented in the OpenMM and GROMACS writers. Verison 0.3.18 makes this distinction more clear.

GregorySchwing commented 10 months ago

How would you go about getting the water geometry correct? Current angle after generating confomations is 111 degrees. Should be 103.6 degrees. Bond distances are wrong also.

mattwthompson commented 10 months ago

Are you referring to the conformers you generated here?

openff_mol.generate_conformers(n_conformers=1)

These conformers are generated by RDKit (or a similar tool). They don't use a force field, but should be close enough to the MM minima that it'll equilibrate to whatever's specified by the force field. If for some reason you need them to start at a force field minima, you can hard-code the positions/conformers or use Interchange.minimize.

GregorySchwing commented 10 months ago

interchange.to_gro() fails after Interchange.minimize since the virtual site is added by openmm.

print("BEFORE",interchange.positions)
interchange.minimize()
print("AFTER",interchange.positions)

interchange.to_gro("water.gro")
interchange.to_top("water.top")
BEFORE [[-8.161597176171645e-05 0.036637842847512564 -0.0] [-0.08123162006042906 -0.018348210785835466 -0.0] [0.08131323603219076 -0.018289632061677063 0.0]] nanometer

AFTER [[-7.367174112005159e-05 0.03652547672390938 0.0] [-0.06858216226100922 -0.017492877319455147 0.0] [0.06853829324245453 -0.017361382022500038 0.0] [-5.838654760736972e-05 0.02058565244078636 0.0]] nanometer

Traceback (most recent call last):
  File "/tmp/test.py", line 58, in <module>
    interchange.to_gro("water.gro")
  File "/opt/conda/lib/python3.10/site-packages/openff/interchange/components/interchange.py", line 394, in to_gro
    GROMACSWriter(
  File "/opt/conda/lib/python3.10/site-packages/openff/interchange/interop/gromacs/export/_export.py", line 48, in to_gro
    self._write_gro(gro, decimal)
  File "/opt/conda/lib/python3.10/site-packages/openff/interchange/interop/gromacs/export/_export.py", line 323, in _write_gro
    assert n_particles == self.system.positions.shape[0], (
AssertionError: (4, 5)
GregorySchwing commented 10 months ago

I'm going to leave this last comment:

Going through parmed results in an incorrect prmtop file for my purposes. It may run in amber, but I'm trying to create 1D-RISM models. Therefore, the LJ_types need to be correct.

This is just with SPCE 3-point water, so I could test interchange. SPCE has 2 LJ_types, (O and H).

Here is an excerpt of the pointers of the prmtop file from interchange -> gromacs -> parmed -> amber

>>> print(parm.pointers)
{'NATOM': 3, 'NTYPES': 3, 'NBONH': 3, 'MBONA': 0, 'NTHETH': 0, 'MTHETA': 0, 'NPHIH': 0, 'MPHIA': 0, 'NHPARM': 0, 'NPARM': 0, 'NEXT': 4, 'NNB': 4, 'NRES': 1, 'NBONA': 0, 'NTHETA': 0, 'NPHIA': 0, 'NUMBND': 2, 'NUMANG': 0, 'NPTRA': 0, 'NATYP': 1, 'NPHB': 0, 'IFPERT': 0, 'NBPER': 0, 'NGPER': 0, 'NDPER': 0, 'MBPER': 0, 'MGPER': 0, 'MDPER': 0, 'IFBOX': 1, 'NMXRS': 3, 'IFCAP': 0, 'NUMEXTRA': 0, 'IPTRES': 1, 'NSPM': 1, 'NSPSOL': 2}
>>> parm.LJ_types
{'MOL0': 3}
>>> print(parm.LJ_depth)
[0.15210000020871914, 0, 0]
>>> print(parm.LJ_radius)
[1.7682705875656692, 0, 0]

interchange -> amber

>>> print(parm.pointers)
{'NATOM': 3, 'NTYPES': 2, 'NBONH': 2, 'MBONA': 0, 'NTHETH': 1, 'MTHETA': 0, 'NPHIH': 0, 'MPHIA': 0, 'NHPARM': 0, 'NPARM': 0, 'NEXT': 4, 'NNB': 4, 'NRES': 1, 'NBONA': 0, 'NTHETA': 0, 'NPHIA': 0, 'NUMBND': 1, 'NUMANG': 1, 'NPTRA': 0, 'NATYP': 1, 'NPHB': 0, 'IFPERT': 0, 'NBPER': 0, 'NGPER': 0, 'NDPER': 0, 'MBPER': 0, 'MGPER': 0, 'MDPER': 0, 'IFBOX': 1, 'NMXRS': 3, 'IFCAP': 0, 'NUMEXTRA': 0, 'IPTRES': 1, 'NSPM': 1, 'NSPSOL': 2, 'NCOPY': 0}
>>> parm.LJ_types
{'O1': 1, 'H1': 2, 'H2': 2}
>>> print(parm.LJ_depth)
[0.15210000020871914, 0]
>>> print(parm.LJ_radius)
[1.7682705875656692, 0]

The only way to get a correct prmtop file is for interchange to write the amber file. Parmed messes up the LJ_types.

mattwthompson commented 10 months ago

Thanks for all the hacking - I'm trying to distill everything down to specific things I can fix. Interchange by necessity makes some approximations with atom types (SMIRNOFF does not have atom types whereas most other tools, especially anything Amber-related have them deeply built-in) and one of the consequences is that GROMACS files have more atom types than strictly necessary, i.e.


[ atomtypes ]
;type, bondingtype, atomic_number, mass, charge, ptype, sigma, epsilon
MOL0_0           8  15.99943    0.0000000000000000  A       0.31507 0.6363864000000001
MOL0_1           1  1.007947    0.0000000000000000  A       0.09999999999999999 0
MOL0_2           1  1.007947    0.0000000000000000  A       0.09999999999999999 0

ParmEd is probably parsing what's in the file, which looks funny (identical atom types, with no interactions) by isn't incorrect for running simulations. (Maybe something's out of whack with OPC due to the virtual site, I've only run this with a three-site model.)

GregorySchwing commented 10 months ago

Seems the problem is in Parmed's Gromacs to Amber converter upon inspecting your gromacs file. Further, my last post was using a 3-site water, so it's not caused by virtual sites.

mattwthompson commented 10 months ago

I could maybe use a context reset to better understand what exactly is going wrong here (when virtual sites are not present)