openmm / openmmforcefields

CHARMM and AMBER forcefields for OpenMM (with small molecule support)
http://openmm.org
Other
259 stars 81 forks source link

AmberTools 23 causes openmmforcefield tests to fail #281

Open jchodera opened 1 year ago

jchodera commented 1 year ago

The update from ambertools 22 to 23 appears to cause the tests to fail:

____________________________________________________________________________________________________________________ TestSystemGenerator.test_add_molecules _____________________________________________________________________________________________________________________

self = <openmmforcefields.tests.test_system_generator.TestSystemGenerator testMethod=test_add_molecules>

    def test_add_molecules(self):
        """Test that Molecules can be added to SystemGenerator later"""
        for small_molecule_forcefield in SMALL_MOLECULE_FORCEFIELDS:
            print(small_molecule_forcefield)
            # Create a SystemGenerator for this force field
            generator = SystemGenerator(forcefields=self.amber_forcefields,
                                            small_molecule_forcefield=small_molecule_forcefield)

            # Add molecules for each test system separately
            for name, testsystem in self.testsystems.items():
                molecules = testsystem['molecules']

                # Add molecules
                generator.add_molecules(molecules)

                # Parameterize molecules
                for molecule in molecules:
                    # DEBUG
                    if not (small_molecule_forcefield == 'gaff-2.11'):
                        continue

                    openmm_topology = molecule.to_topology().to_openmm()
                    with Timer() as t1:
                        try:
                            print(f'{small_molecule_forcefield} {name} {molecule.name} first generation')
                            system = generator.create_system(openmm_topology)
                        except ValueError as e:
>                           raise(e)

tests/test_system_generator.py:341: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_system_generator.py:339: in test_add_molecules
    system = generator.create_system(openmm_topology)
generators/system_generators.py:332: in create_system
    system = self.forcefield.createSystem(topology, **forcefield_kwargs)
../../../../micromamba/envs/openmmforcefields/lib/python3.10/site-packages/openmm/app/forcefield.py:1218: in createSystem
    templateForResidue = self._matchAllResiduesToTemplates(data, topology, residueTemplates, ignoreExternalBonds)
../../../../micromamba/envs/openmmforcefields/lib/python3.10/site-packages/openmm/app/forcefield.py:1423: in _matchAllResiduesToTemplates
    if generator(self, res):
generators/template_generators.py:552: in generator
    return super().generator(forcefield, residue)
generators/template_generators.py:325: in generator
    forcefield.loadFile(StringIO(ffxml_contents))
../../../../micromamba/envs/openmmforcefields/lib/python3.10/site-packages/openmm/app/forcefield.py:288: in loadFile
    self.registerAtomType(type.attrib)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <openmm.app.forcefield.ForceField object at 0x17cc889a0>, parameters = {'class': 'c5', 'element': 'C', 'mass': '12.01', 'name': 'c5'}

    def registerAtomType(self, parameters):
        """Register a new atom type."""
        name = parameters['name']
        if name in self._atomTypes:
>           raise ValueError('Found multiple definitions for atom type: '+name)
E           ValueError: Found multiple definitions for atom type: c5

../../../../micromamba/envs/openmmforcefields/lib/python3.10/site-packages/openmm/app/forcefield.py:437: ValueError

This appears to be due to a new gaff2.dat that is distributed with this release (and possibly earlier?) that includes new types:

AMBER General Force Field for organic molecules (Version 2.2.20, March 2021)

The parmchk2 tool now generates parameters for this version, it appears.

For example, for the bace testsystem CAT-13d ligand with gaff-2.11, ambertools=22 generates the frcmod file:

Remark line goes here
MASS

BOND
cz-n   330.20   1.399       same as c2- n, penalty score=  3.2
cz-nv  414.20   1.339       same as cz-nh, penalty score=  0.0
cz-nu  414.20   1.339       same as cz-nh, penalty score=  0.0
hn-nv  529.50   1.012       same as hn-nh, penalty score=  0.0
c3-nu  261.80   1.464       same as c3-nh, penalty score=  0.0
hn-nu  529.50   1.012       same as hn-nh, penalty score=  0.0

ANGLE
c3-n -cz   65.000     120.100   same as c2-n -c3, penalty score=  2.2
n -cz-nv  112.600     120.140   same as nh-cz-nh, penalty score=  1.8
n -cz-nu  112.600     120.140   same as nh-cz-nh, penalty score=  1.8
c -n -cz   66.400     122.150   same as c -n -c2, penalty score=  2.2
cz-nv-hn   49.200     121.150   same as cz-nh-hn, penalty score=  0.0
c3-nu-cz   64.700     125.460   same as c3-nh-cz, penalty score=  0.0
cz-nu-hn   49.200     121.150   same as cz-nh-hn, penalty score=  0.0
nu-cz-nv  112.600     120.140   same as nh-cz-nh, penalty score=  0.0
ca-c3-nu   83.900     111.390   same as ca-c3-nh, penalty score=  0.0
cx-c3-nu   86.600     103.860   same as cx-c3-nh, penalty score=  0.0
c -c3-nu   84.400     109.350   same as c -c3-nh, penalty score=  0.0
c3-nu-hn   46.400     115.990   same as c3-nh-hn, penalty score=  0.0
hn-nv-hn   39.500     115.120   same as hn-nh-hn, penalty score=  0.0

DIHE
nv-cz-n -c3   4    2.600       180.000           2.000      same as X -c2-n -X , penalty score=462.5
nu-cz-n -c3   4    2.600       180.000           2.000      same as X -c2-n -X , penalty score=462.5
n -cz-nv-hn   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
n -cz-nu-c3   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
n -cz-nu-hn   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
ca-c3-nu-cz   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
cx-c3-nu-cz   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
c -c3-nu-cz   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
nv-cz-nu-c3   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
nv-cz-nu-hn   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
nu-cz-nv-hn   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
nu-c3-cx-cx   1    0.210         0.000           3.000      same as nh-c3-c3-c3, penalty score=179.0
nu-c3-cx-hc   9    1.400         0.000           3.000      same as X -c3-c3-X , penalty score= 92.0
ca-c3-cx-cx   9    1.400         0.000           3.000      same as X -c3-c3-X , penalty score= 92.0
ca-c3-cx-hc   9    1.400         0.000           3.000      same as X -c3-c3-X , penalty score= 92.0
nv-cz-n -c    4    2.600       180.000           2.000      same as X -c2-n -X , penalty score=462.5
nu-cz-n -c    4    2.600       180.000           2.000      same as X -c2-n -X , penalty score=462.5
c -c3-cx-cx   1    0.100         0.000           3.000      same as c -c3-c3-c3, penalty score=179.0
c -c3-cx-hc   9    1.400         0.000           3.000      same as X -c3-c3-X , penalty score= 92.0
ca-c3-nu-hn   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
cx-c3-nu-hn   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
c -c3-nu-hn   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0

IMPROPER
c -c3-n -cz         1.1          180.0         2.0          Using the default value
n -nu-cz-nv         1.1          180.0         2.0          Using the default value
cz-hn-nv-hn         1.1          180.0         2.0          Using the default value
c3-cz-nu-hn         1.1          180.0         2.0          Using the default value
ca-ca-ca-ha         1.1          180.0         2.0          Using general improper torsional angle  X- X-ca-ha, penalty score=  6.0)
ca-cp-ca-ha         1.1          180.0         2.0          Using general improper torsional angle  X- X-ca-ha, penalty score=  6.0)
ca-ca-cp-cp         1.1          180.0         2.0          Using the default value
c3-n -c -o         10.5          180.0         2.0          Using general improper torsional angle  X- X- c- o, penalty score=  6.0)

NONBON

Now, with ambertools=23, the frcmod file is different:

Remark line goes here
MASS
c5 12.010        0.878               same as c3 

BOND
cz-n   330.20   1.399       same as c2- n, penalty score=  3.2
cz-nv  414.20   1.339       same as cz-nh, penalty score=  0.0
cz-nu  414.20   1.339       same as cz-nh, penalty score=  0.0
hn-nv  529.50   1.012       same as hn-nh, penalty score=  0.0
c5-nu  261.80   1.464       same as c3-nh, penalty score=  0.0
hn-nu  529.50   1.012       same as hn-nh, penalty score=  0.0
c5-ca  250.30   1.516       same as c3-ca, penalty score=  0.0
c5-cx  244.90   1.522       same as c3-cx, penalty score=  0.0
c -c5  243.20   1.524       same as  c-c3, penalty score=  0.0

ANGLE
c3-n -cz   65.000     120.100   same as c2-n -c3, penalty score=  2.2
n -cz-nv  112.600     120.140   same as nh-cz-nh, penalty score=  1.8
n -cz-nu  112.600     120.140   same as nh-cz-nh, penalty score=  1.8
c5-c -n    84.300     115.180   same as c3-c -n , penalty score=  0.0
c -n -cz   66.400     122.150   same as c -n -c2, penalty score=  2.2
cz-nv-hn   49.200     121.150   same as cz-nh-hn, penalty score=  0.0
c5-nu-cz   64.700     125.460   same as c3-nh-cz, penalty score=  0.0
cz-nu-hn   49.200     121.150   same as cz-nh-hn, penalty score=  0.0
nu-cz-nv  112.600     120.140   same as nh-cz-nh, penalty score=  0.0
ca-c5-nu   83.900     111.390   same as ca-c3-nh, penalty score=  0.0
cx-c5-nu   86.600     103.860   same as cx-c3-nh, penalty score=  0.0
c -c5-nu   84.400     109.350   same as c -c3-nh, penalty score=  0.0
c5-nu-hn   46.400     115.990   same as c3-nh-hn, penalty score=  0.0
c5-ca-ca   65.600     120.770   same as c3-ca-ca, penalty score=  0.0
c5-cx-cx   63.400     120.110   same as c3-cx-cx, penalty score=  0.0
c5-cx-hc   46.300     114.480   same as c3-cx-hc, penalty score=  0.0
c5-c -o    84.600     123.200   same as c3-c -o , penalty score=  0.0
ca-c5-cx   65.400     112.620   same as ca-c3-cx, penalty score=  0.0
c -c5-ca   65.800     111.010   same as c -c3-ca, penalty score=  0.0
c -c5-cx   65.600     111.160   same as c -c3-cx, penalty score=  0.0
hn-nv-hn   39.500     115.120   same as hn-nh-hn, penalty score=  0.0

DIHE
nv-cz-n -c3   4    2.600       180.000           2.000      same as X -c2-n -X , penalty score=462.5
nu-cz-n -c3   4    2.600       180.000           2.000      same as X -c2-n -X , penalty score=462.5
c5-c -n -c3   1    0.260       180.000          -2.000      same as c3-n -c -c3
c5-c -n -c3   1    0.500         0.000           1.000      same as c3-n -c -c3, penalty score=  0.0
n -cz-nv-hn   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
n -cz-nu-c5   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
n -cz-nu-hn   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
ca-c5-nu-cz   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
cx-c5-nu-cz   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
c -c5-nu-cz   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
nv-cz-nu-c5   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
nv-cz-nu-hn   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
nu-cz-nv-hn   4    2.700       180.000           2.000      same as X -c2-nh-X , penalty score=462.5
nu-c5-ca-ca   6    0.000         0.000           2.000      same as X -c3-ca-X , penalty score=  0.0
nu-c5-cx-cx   1    0.210         0.000           3.000      same as nh-c3-c3-c3, penalty score=179.0
nu-c5-cx-hc   9    1.400         0.000           3.000      same as X -c3-c3-X , penalty score= 92.0
n -c -c5-nu   6    0.000       180.000           2.000      same as X -c -c3-X , penalty score=  0.0
o -c -c5-nu   6    0.000       180.000           2.000      same as X -c -c3-X , penalty score=  0.0
ca-c5-cx-cx   9    1.400         0.000           3.000      same as X -c3-c3-X , penalty score= 92.0
ca-c5-cx-hc   9    1.400         0.000           3.000      same as X -c3-c3-X , penalty score= 92.0
n -c -c5-ca   6    0.000       180.000           2.000      same as X -c -c3-X , penalty score=  0.0
o -c -c5-ca   6    0.000       180.000           2.000      same as X -c -c3-X , penalty score=  0.0
cx-c5-ca-ca   6    0.000         0.000           2.000      same as X -c3-ca-X , penalty score=  0.0
n -c -c5-cx   6    0.000       180.000           2.000      same as X -c -c3-X , penalty score=  0.0
o -c -c5-cx   6    0.000       180.000           2.000      same as X -c -c3-X , penalty score=  0.0
nv-cz-n -c    4    2.600       180.000           2.000      same as X -c2-n -X , penalty score=462.5
nu-cz-n -c    4    2.600       180.000           2.000      same as X -c2-n -X , penalty score=462.5
c -c5-ca-ca   6    0.000         0.000           2.000      same as X -c3-ca-X , penalty score=  0.0
c -c5-cx-cx   1    0.100         0.000           3.000      same as c -c3-c3-c3, penalty score=179.0
c -c5-cx-hc   9    1.400         0.000           3.000      same as X -c3-c3-X , penalty score= 92.0
ca-c5-nu-hn   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
cx-c5-nu-hn   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0
c -c5-nu-hn   6    0.000         0.000           2.000      same as X -c3-nh-X , penalty score=  0.0

IMPROPER
c -c3-n -cz         1.1          180.0         2.0          Using the default value
n -nu-cz-nv         1.1          180.0         2.0          Using the default value
cz-hn-nv-hn         1.1          180.0         2.0          Using the default value
c5-cz-nu-hn         1.1          180.0         2.0          Using the default value
c5-ca-ca-ca         1.1          180.0         2.0          Using the default value
ca-ca-ca-ha         1.1          180.0         2.0          Using general improper torsional angle  X- X-ca-ha, penalty score=  6.0)
ca-cp-ca-ha         1.1          180.0         2.0          Using general improper torsional angle  X- X-ca-ha, penalty score=  6.0)
ca-ca-cp-cp         1.1          180.0         2.0          Using the default value
c5-n -c -o         10.5          180.0         2.0          Using general improper torsional angle  X- X- c- o, penalty score=  6.0)

NONBON
  c5          1.9069  0.1078             same as c3 

It appears that the second time a molecule generates the c5 parameter, OpenMM ForceField complains that we are trying to load in a new c5 type.

There are a few problems to be solved here:

  1. We need to make sure parmchk* generates parameters for the desired gaff.dat, not just the latest gaff.dat
  2. We need a way to enable multiple molecules whose frcmod files generate new atom types to be loaded (as long as those atom types are identical)

Issue 1 is especially puzzling since we feed parmchk2 the same GAFF parameter file we are using via the-p gaff.dat argument. I suspect the code for parmchk2 has changed.

jchodera commented 1 year ago

@peastman : Is there a way we could permit openmm.app.ForceField.registerAtomType() to allow multiple registrations of the same atom type provided the definitions are identical?

The alternative would require substantial editing of the ParmEd data structures, or even worse, the ffxml files, to de-duplicate entries based on what has already been registered.

Another possibility would be to completely change the way we handle GAFF and use LEaP to generate an OpenMM System, and then use convert_system_to_ffxml so that each molecule receives a completely distinct set of atom types, rather than sharing the common gaff*.xml atom types. This would be a major change in behavior that we'd have to test extensively, but in case prmchk2 ever generates conflicting new types in the frcmod file it generates for different molecules loaded into the same ForceField object, this might be the most future-proof approach.

peastman commented 1 year ago

That seems reasonable. We just need to verify that the class, element, and mass are all identical.

mattwthompson commented 6 months ago

With https://github.com/openmm/openmm/pull/4531 patched into my local setup, I have this package's tests passing (so far) with AmberTools 23 installed

I don't know a good way to run this package's tests here against development builds of OpenMM - I patched $CONDA_PREFIX/lib/python3.11/site-packages/openmm/app/forcefield.py and refuse to let that propagate out past my machine

peastman commented 6 months ago

Is that PR finished? I've been ignoring it since you have it marked as a draft. Once it's ready we can merge it and get it into the next release.

peastman commented 6 months ago

Actually, I see it's a brand new PR. I was just confused. :) Let me know when it's ready for review.

mattwthompson commented 6 months ago

I think it's ready as-is - I marked it as a draft only because I wasn't sure if I was going to break CI or not.

The CI runs (in the main repo) are coming in green so far, so I'd be happy to take feedback there (or here) on what to do next (add tests? test here? etc.)