openmm / openmmforcefields

CHARMM and AMBER forcefields for OpenMM (with small molecule support)
http://openmm.org
Other
222 stars 78 forks source link

CHARMM: Handle patches #22

Closed jchodera closed 6 years ago

jchodera commented 7 years ago

Since the CHARMM forcefield only specified patch/residue compatibility with (barely) human-readable comments (see https://github.com/choderalab/openmm-forcefields/pull/1#issuecomment-283505234), we plan to create a YAML file to direct which ffxml compatibility tags are to be inserted.

The initial version will cover

The syntax will look something like

charmm-patches.yaml:
---
residues:
   ALA:
      - NTER
      - CTER

patches:
   DISU:
      - 1:CYS
      - 2:CYS

In future iterations, we will expand this compatibility list.

peastman commented 6 years ago

I tried creating a system for a topology that includes a lipid membrane. It fails with this error:

  File "/Users/peastman/miniconda3/lib/python3.6/site-packages/simtk/openmm/app/forcefield.py", line 1270, in createSystem
    force.createForce(sys, data, nonbondedMethod, nonbondedCutoff, args)
  File "/Users/peastman/miniconda3/lib/python3.6/site-packages/simtk/openmm/app/forcefield.py", line 2388, in createForce
    reverseMap[typeMap[typeValue]] = typeValue
IndexError: list assignment index out of range

I'm not sure what's causing it.

I also tried creating a force field like this:

forcefield = ForceField('charmm36.xml', 'tip3p.xml')

but it fails with this error:

ValueError: Found multiple NonbondedForce tags with different 1-4 scales

It looks like this file already has TIP3P built into it. But what do you do if you want to use a different water model?

Looking through the file, I noticed there's a patch called FHEM that consists entirely of many screenfulls of <ApplyToResidue> tags. It looks like this patch doesn't actually do anything, and therefore was considered to be applicable to all residues?

jchodera commented 6 years ago

Thanks for giving this a try!

It would be great if we could have ForceField emit a more informative error message for the IndexError: list assignment index out of range. Perhaps an atom type is missing?

For the ValueError: Found multiple NonbondedForce tags with different 1-4 scales, it looks like we have both a NonbondedForce section and a LennardJonesForce section (is that a real thing?):

<NonbondedForce coulomb14scale="1.0" lj14scale="1.0">
...
<LennardJonesForce lj14scale="1.0">
...

We currently include the following files in the conversion. This includes toppar/toppar_water_ions.str, which has the CHARMM variant of TIP3P (which I believe differs from the AMBER variant). We would have to implement a special scheme to censor these residue types out, though it could be done, but I recall that our last discussion had us deciding to distribute CHARMM as a single monolithic forcefield.

Regarding FHEM, it survived the compatibility check because it passed the criteria: An exception is thrown unless

Here's what the FHEM patch does:

PRES FHEM         0.00 ! FIX UP THE HEME BY DELETING UNWANTED AUTOGENERATED ANGLES
                       ! unliganded heme patch
DELETE ANGLE 1NA 1FE 1NC  1NB 1FE 1ND 

We don't handle patches that delete angles, given that we autogenerate bonds, torsions, and angles.

I can add an additional check to make sure the patch did something (e.g. added/changed/deleted atoms).

jchodera commented 6 years ago

Also, I asked earlier if there were systems I could borrow from the OpenMM Python test suite to add to the ParmEd test suite to make sure the converted forcefields can actually parameterize them. Any suggestions? That will help catch these issues earlier.

peastman commented 6 years ago

The error is because the CHARMM file specifies

 <NonbondedForce coulomb14scale="1.0" lj14scale="1.0">

while the TIP3P file specifies

 <NonbondedForce coulomb14scale="0.833333" lj14scale="0.5">

Should it be possible to combine CHARMM with different water models? If TIP3P is the only model that will ever be used with it, then it's reasonable to bundle it with the force field. (We do the same for AMOEBA.) However, a quick web search suggests that's not the case, and that a variety of water models are available in CHARMM. That suggests it should not be bundled into the main force field. It also means we need to find a way around that incompatibility in the scale factors, but we can do that.

I asked earlier if there were systems I could borrow from the OpenMM Python test suite to add to the ParmEd test suite

You can look through the files in https://github.com/pandegroup/openmm/tree/master/wrappers/python/tests/systems. It's a fairly random assortment of things, just whatever was convenient for particular test cases.

jchodera commented 6 years ago

A few things here:

Should it be possible to combine CHARMM with different water models?

I thought we had decided "no" for this conversion, which focuses exclusively on the canonical CHARMM36 family of forcefields. We can certainly re-evaluate if there is utility in doing so.

If TIP3P is the only model that will ever be used with it, then it's reasonable to bundle it with the force field. (We do the same for AMOEBA.)

All the parameter files in the official CHARMM36 distribution are unpacked here.

It appears this distribution does contain additional water models here, though it's under the non_charmm/ directory. It looks like it should be straightforward to split off the toppar/toppar_water_ions.str file and convert each of these to a separate set of water+ions parameters so that users could mix-and-match.

Do we want that to be the only route? So that the user must specify

forcefield = ForceField('charmm36-biomolecules.xml', 'charmm36-tip3p-water-and-ions.xml')

or do we also want to bundle a charmm36.xml that contains TIP3P as well? This could even be a meta-XML file that just includes both of the above files.

In any case, we're not going to be able to use the current tip3p.xml because the 1,4 scaling rules are different.

swails commented 6 years ago

The error is because the CHARMM file specifies

 <NonbondedForce coulomb14scale="1.0" lj14scale="1.0">

while the TIP3P file specifies

 <NonbondedForce coulomb14scale="0.833333" lj14scale="0.5">

But... water has no 1-4 terms? :woman_shrugging:

I thought we had decided "no" for this conversion, which focuses exclusively on the canonical CHARMM36 family of forcefields. We can certainly re-evaluate if there is utility in doing so.

I tend to agree with this. The CHARMM philosophy is that nonbonded parameters are derived within the context of a specific water model -- in this case their own modified TIP3P model. Which is why CHARMM force field parameters differ between GB and explicit solvent models. At least my understanding is that, according to the CHARMM force field developers, CHARMM + \<any other water model> is not a tested or validated force field.

jchodera commented 6 years ago

But... water has no 1-4 terms?

And in principle, we could have different 1-4 treatment for each distinct molecule. But since ForceField associates the 1-4 terms with NonbondedForceGenerator, and multiple <NonbondedForce> blocks are read into the same generator, it would seem to require a lot of restructuring to be able to deal with this.

peastman commented 6 years ago

But... water has no 1-4 terms?

Exactly. In general you can't combine different nonbonded forces with different scaling factors. But water is a special case, since it doesn't have 1-4 pairs. My thought was to allow the <NonbondedForce> tag to omit those attributes, which would be interpreted as "don't care". You could then combine that tag with another <NonbondedForce> tag, and it would accept whatever scaling factors that tag specified.

At least my understanding is that, according to the CHARMM force field developers, CHARMM + \ is not a tested or validated force field.

That's good to know. But how well does that match practice? Is it something people do anyway, even if the developers say it's not validated? If we don't support it, will we immediately start getting people posting to ask how they can combine TIP4P-Ew with the CHARMM force field?

This could even be a meta-XML file that just includes both of the above files.

I think that's a good idea. It would preserve flexibility while still making the recommended combination very easy.

jchodera commented 6 years ago

@peastman: Just an update that I'm

Just so I understand correctly how OpenMM handles single-residue patches: Internally, OpenMM will try all compatible patch x residue combinations if a residue is not recognize, right? I'm currently seeing that terminal residues for (Ala)_3 are failing my tests with

======================================================================
ERROR: Test writing XML parameter files from Charmm parameter files and reading them back into OpenMM ForceField
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/choderaj/github/parmed/ParmEd/test/test_parmed_openmm.py", line 597, in test_write_xml_parameters_charmm
    system = forcefield.createSystem(pdbfile.topology, nonbondedMethod=app.NoCutoff)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 1077, in createSystem
    raise ValueError('No template found for residue %d (%s).  %s' % (res.index+1, res.name, _findMatchErrors(self, res)))
ValueError: No template found for residue 1 (ALA).  The set of atoms is similar to PRO, but it is missing 2 atoms.
peastman commented 6 years ago

Internally, OpenMM will try all compatible patch x residue combinations if a residue is not recognize, right?

Correct. That's done here: https://github.com/pandegroup/openmm/blob/master/wrappers/python/simtk/openmm/app/forcefield.py#L1481

jchodera commented 6 years ago

OK, I may need a bit of help debugging this, but let me extract a minimal ffxml file that just covers (Ala)3 first.

jchodera commented 6 years ago

@peastman : Here's a minimal example (containing no force generators) that attempts to parameterize (Ala)3 with NTER and CTER patches. Any insight into why ForceField is refusing to recognize the patch-modified terminal residues would be very much appreciated!

debug.zip

peastman commented 6 years ago

I see a few problems with your patches. First, you're using incorrect tag names. They should be <RemoveAtom> and <RemoveBond>, not <DeleteAtom> and <DeleteBond>. Also, you need to add a <RemoveExternalBond> to tell it the terminal residues will only be bonded to one other residue.

jchodera commented 6 years ago

Thanks! I've fixed the names.

I actually had code to generate Add/RemoveExternalBond tags---will have to debug why that's not working.

jchodera commented 6 years ago

@peastman: I've fixed a few issues, and now I see this for the updated test case:

lski1962:debug choderaj$ python build_ala.py 
Traceback (most recent call last):
  File "build_ala.py", line 9, in <module>
    system = forcefield.createSystem(pdbfile.topology, nonbondedMethod=app.NoCutoff)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 1056, in createSystem
    unmatchedResidues = _applyPatchesToMatchResidues(self, data, unmatchedResidues, bondedToAtom, ignoreExternalBonds)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 1465, in _applyPatchesToMatchResidues
    [template, matches] = forcefield._getResidueTemplateMatches(res, bondedToAtom, patchedTemplateSignatures, ignoreExternalBonds)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 830, in _getResidueTemplateMatches
    raise Exception('Multiple matching templates found for residue %d (%s): %s.' % (res.index+1, res.name, ', '.join(match[0].name for match in allMatches)))
Exception: Multiple matching templates found for residue 1 (ALA): ALA-NTER, ALA-NTER.

I suspect the problem is that OpenMM is building residue templates for both AllowPatch and ApplyToResidue tags (we list both in the forcefield conversion), but not culling duplicates. I would imagine that culling duplicates is the least surprising behavior, but if this sounds wrong to you, I can just specify either ApplyToResidue or AllowPatch---though I don't know which one makes the most sense, as both are obviously useful for answering the questions "which patches are allowed for this residue?" and "which residues can I apply this patch to?".

debug2.zip

peastman commented 6 years ago

It never occurred to me that someone would do both. If they do, filtering the duplicates seems like the right thing to do. On the other hand, only specifying one of them also seems like the right thing to do.

jchodera commented 6 years ago

If only specifying one of them is the right thing to do, which one is it?

jchodera commented 6 years ago

I can't come up with compelling arguments for one way vs the other, which is why I'm thinking that specifying only one of them isn't the right thing to do.

peastman commented 6 years ago

It doesn't matter. I just included both options for flexibility when combining multiple files. If you create a file that adds parameters for a new amino acid, you can use <AllowPatch> to specify that the existing NTER and CTER patches can be applied to it. If you create a file that adds a new patch for modifying standard amino acids, you can use <ApplyToResidue> to specify that it can be applied to the existing residues. But when everything is in one file, it makes no difference which one you use.

I'd say to go with <AllowPatch>. It's a few characters shorter, so it might reduce the file size by about 1K. :)

jchodera commented 6 years ago

I'd say to go with . It's a few characters shorter, so it might reduce the file size by about 1K. :)

OK, but can you add a warning to the documentation that AllowPatch and ApplyToResidue cannot be specified simultaneously without causing (likely incomprehensible) errors?

peastman commented 6 years ago

Even better, I'll make it filter the duplicates. But you should also remove the duplicates from the file. That will reduce the size by many KB!

jchodera commented 6 years ago

That will reduce the size by many KB!

Weren't you just commenting on how the 55 TB of data taken up by uncompressed xml files was no big deal because disk space is cheap now? :)

I'm working on generating the new split files now.

peastman commented 6 years ago

No, it was the 4 MB force field XML file I said wasn't significant. You should totally compress all your data! For reading them, just replace open() with gzip.open() and everything should work with no change.

jchodera commented 6 years ago

No, it was the 4 MB force field XML file I said wasn't significant. You should totally compress all your data! For reading them, just replace open() with gzip.open() and everything should work with no change.

We're still struggling to figure out how to do that with the FAH cores.

jchodera commented 6 years ago

OK, I've now checked that

Here are the newly converted files: charmm36.zip

I'm still working to add more tests to both ParmEd and the conversion scripts, but I thought I'd share the files in their current state in case they turn out to be useful (or you catch bugs that I haven't seen yet).

peastman commented 6 years ago

We're still struggling to figure out how to do that with the FAH cores.

Yeah, that's another problem. Though isn't that really an issue on the server end, rather than the core?

jchodera commented 6 years ago

Ideally not. If the server has to do the compressing/decompressing, that's enormously more expensive than having the core uncompress when reading due to WU throughput. Having transparent reading/writing of compressed XML files at the OpenMM level would allow the cost of this to be amortized over the many available donor machines.

jchodera commented 6 years ago

It's possible that fahlib library actually supports transparent gzip file streaming, though. Will have to figure that out.

jchodera commented 6 years ago

@peastman : I generated a membrane protein system by following the OpenMM 7 B2AR tutorial, and discovered another issue:

lski1962:test choderaj$ python test-b2ar.py 
Reading PDB file...
Reading forcefield...
Creating System...
Traceback (most recent call last):
  File "test-b2ar.py", line 11, in <module>
    system = forcefield.createSystem(pdbfile.topology)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 1047, in createSystem
    [template, matches] = self._getResidueTemplateMatches(res, bondedToAtom, ignoreExternalBonds=ignoreExternalBonds)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 830, in _getResidueTemplateMatches
    raise Exception('Multiple matching templates found for residue %d (%s): %s.' % (res.index+1, res.name, ', '.join(match[0].name for match in allMatches)))
Exception: Multiple matching templates found for residue 2 (TRP): TRP, DTRP.

Minimal test attached: test-b2ar.zip

In this case, it looks like DTRP is defined in toppar/stream/prot/toppar_all36_prot_d_aminoacids.str as a D-tryptophan. Presumably, the parameters are the same as L-tryptophan, except maybe for the IC cards used to build the residue or perhaps impropers that enforce opposite chirality (if those exist?). Perhaps we should just exclude these?

Tagging @ChayaSt

jchodera commented 6 years ago

I excluded that one stream file with D-amino acids and ended up with a different error:

lski1962:test-b2ar choderaj$ python test-b2ar.py 
Reading PDB file...
Reading forcefield...
Creating System...
Traceback (most recent call last):
  File "test-b2ar.py", line 11, in <module>
    system = forcefield.createSystem(pdbfile.topology, nonbondedMethod=app.PME, constraints=app.HBonds)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 1056, in createSystem
    unmatchedResidues = _applyPatchesToMatchResidues(self, data, unmatchedResidues, bondedToAtom, ignoreExternalBonds)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 1465, in _applyPatchesToMatchResidues
    [template, matches] = forcefield._getResidueTemplateMatches(res, bondedToAtom, patchedTemplateSignatures, ignoreExternalBonds)
  File "/Users/choderaj/miniconda3/lib/python3.5/site-packages/simtk/openmm/app/forcefield.py", line 830, in _getResidueTemplateMatches
    raise Exception('Multiple matching templates found for residue %d (%s): %s.' % (res.index+1, res.name, ', '.join(match[0].name for match in allMatches)))
Exception: Multiple matching templates found for residue 1 (VAL): VAL-NTER, VAL-NTER-DELB, VAL-NTER-FHEM, VAL-NTER-DELB-FHEM.

Is it chaining multiple patches together?

Test case attached: test-b2ar-2.zip

peastman commented 6 years ago

It's possible that fahlib library actually supports transparent gzip file streaming, though. Will have to figure that out.

I believe that's the case. I remember Joseph talking about how compression and/or uncompression was taking a lot of time on the server, and he was looking into switching to a less expensive compression method. So it seems that's time that's already being spent.

Exception: Multiple matching templates found for residue 1 (VAL): VAL-NTER, VAL-NTER-DELB, VAL-NTER-FHEM, VAL-NTER-DELB-FHEM.

If a residue doesn't match any template, it considers all possible combinations of single-residue patches it could apply. In this case, it's finding multiple combinations that could match, and it doesn't know which to pick. FHEM is the one I pointed out to you before that does nothing and can be applied to any residue. DELB is another one like it. They're confusing it, because it has no way to decide whether to apply those patches or not.

jchodera commented 6 years ago

If a residue doesn't match any template, it considers all possible combinations of single-residue patches it could apply. In this case, it's finding multiple combinations that could match, and it doesn't know which to pick. FHEM is the one I pointed out to you before that does nothing and can be applied to any residue. DELB is another one like it. They're confusing it, because it has no way to decide whether to apply those patches or not.

I can certainly have the ResidueTemplate.apply_patch(patch) code throw an exception if the residue has not been modified, which would eliminate patches like FHEM and DELB. Let me try that now.

peastman commented 6 years ago

The right solution is not to include those patches in the XML file in the first place. A patch that does nothing makes no sense.

jchodera commented 6 years ago

That's almost, but not quite, the right solution. CHARMM permits patches that change the charges, atom types, or explicit impropers, but OpenMM cannot automatically perceive these for us. I think we want users to be able to select those patches manually if needed, but the automatic patch perception framework can't deal with them.

In light of these, it would seem we want to include all patches, but only specify patches as compatible with a residue (for purposes of automated residue x patch expansion) if they modify the topology of the residue template.

Does that sound right to you, @peastman?

peastman commented 6 years ago

Those two patches literally do nothing. Here's the definition of one of them:

  <Patch name="FHEM">
  </Patch>

There's no benefit to including that in any form. We should remove them.

A patch that changes atom types but does nothing else is an interesting case. Assuming the two atom types both correspond to the same element, there's no way it could automatically decide whether to use the patch or not. So it could only be useful if we had some mechanism for explicitly telling it to use a patch. Currently, we don't have any such mechanism. We could consider adding one in the future, though we'd need to consider carefully what the use cases are for it.

jchodera commented 6 years ago

Those two patches literally do nothing.

I think I mentioned before that they do something important in the CHARMM world:

PRES FHEM         0.00 ! FIX UP THE HEME BY DELETING UNWANTED AUTOGENERATED ANGLES
                       ! unliganded heme patch
                       ! do NOT use AUTOgenerate ANGLes DIHEdrals after this patch
DELETE ANGLE 1NA 1FE 1NC  1NB 1FE 1ND   

PRES DELB         0.00     ! patch to delete all possible base atoms
                           ! of Cyt,Gua,Ade,Thy and Ura
                           !
!note: error messages will be obtained due to atoms not present in
!residue being "deleted" by this patch
!cyt section
DELE ATOM N1   
DELE ATOM C6   
DELE ATOM H6   
DELE ATOM C2   
DELE ATOM O2   
DELE ATOM N3   
DELE ATOM C4   
DELE ATOM N4   
DELE ATOM H41  
DELE ATOM H42  
DELE ATOM C5   
DELE ATOM H5   
!gua section
DELE ATOM N9   
DELE ATOM H1   
DELE ATOM N2   
DELE ATOM H21  
DELE ATOM H22  
DELE ATOM O6   
DELE ATOM N7   
DELE ATOM C8   
DELE ATOM H8   
!ade section
DELE ATOM H2   
DELE ATOM N6   
DELE ATOM H61  
DELE ATOM H62  
!thy/ura section
DELE ATOM H3   
DELE ATOM O4   
DELE ATOM C5M  
DELE ATOM H51  
DELE ATOM H52  
DELE ATOM H53  

There's no benefit to including that in any form. We should remove them.

FHEM does nothing in the OpenMM world, since we autogenerate angles ourselves (which may be another point of discrepancy). We can detect and omit these patches that are literally blank.

I'm surprised that DELB was permitted to apply to amino acids---that doesn't seem like it should be the case.

A patch that changes atom types but does nothing else is an interesting case. Assuming the two atom types both correspond to the same element, there's no way it could automatically decide whether to use the patch or not. So it could only be useful if we had some mechanism for explicitly telling it to use a patch. Currently, we don't have any such mechanism. We could consider adding one in the future, though we'd need to consider carefully what the use cases are for it.

We definitely need a mechanism for telling ForceField to apply a patch that involves two or more residues so we can make disulfide bonds and the like, and it sounds like we will also need a mechanism to specify single-residue patches (and maybe also turn off auto-patching if needed) for cases that modify atom types or delete explicit impropers.

One such example, I believe, is patches that change only the charges of titratable residues for constant-pH dynamics.

peastman commented 6 years ago

We definitely need a mechanism for telling ForceField to apply a patch that involves two or more residues so we can make disulfide bonds and the like

It'll detect them automatically.

jchodera commented 6 years ago

It looks like I have to exclude all D-amino acids from the conversion because they match all the L-amino acids:

charmm/toppar/stream/prot/toppar_all36_prot_d_aminoacids.str
jchodera commented 6 years ago

I've made the changes I think are necessary, but got bogged down converting the ffxml writing to use lxml.etree to handle Unicode and other issues. That's done now in https://github.com/ParmEd/ParmEd/pull/895 but the tests need to be completed.

jchodera commented 6 years ago

I'm still having trouble with the CHARMM conversion:

lski2414:charmm choderaj$ python
Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:04:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from simtk.openmm import app
>>> pdbfile = app.PDBFile('tests/POPC.pdb')
>>> ff = app.ForceField('ffxml/charmm36.xml')
>>> system = ff.createSystem(pdbfile.topology)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/choderaj/miniconda/lib/python3.6/site-packages/simtk/openmm/app/forcefield.py", line 1270, in createSystem
    force.createForce(sys, data, nonbondedMethod, nonbondedCutoff, args)
  File "/Users/choderaj/miniconda/lib/python3.6/site-packages/simtk/openmm/app/forcefield.py", line 2388, in createForce
    reverseMap[typeMap[typeValue]] = typeValue
IndexError: list assignment index out of range

I'm not quite sure what's causing this failure, but it also took several minutes for the ff.createSystem() line to run.

jchodera commented 6 years ago

That last error was fixed in https://github.com/pandegroup/openmm/pull/1915

I think this is finished now! Will reopen if we run into more trouble.