Ensure AMBER ff15ipq included in build is same as final released version

jchodera commented 7 years ago

We need to convert AMBER ff15ipq from AmberTools17.

dscerutti commented 6 years ago

Thanks for checking up on this. I am ready to help in any way you need, but my feeling is that if ff14SB converts cleanly then ff15ipq should do the same. There were issues with the ff14ipq conversion but we've cleaned up the parameter file considerably. ff15ipq is also a much better representation of the physics--same idea as '14, but with 10x more data and a wider parameter selection.

rafwiewiora commented 6 years ago

@dscerutti just gave it a try - there's now another type of a problem with proper torsions.

1) As a reminder for everyone - the previous problems with ff14ipq conversion are discussed here: https://github.com/choderalab/openmm/issues/4 - there were proper torsions with their periodicities in the wrong order - i.e. instead of all periodicities of one, then all periodicities of another, they were mixed up. tleap's behavior in that case is weird (it took all periodicities for the first such written torsion, but only the first periodicity for the rest) - ParmEd now throws an exception.

For completeness, I also tried converting oldff/leaprc.ff14ipq - it looks like the previous problem was corrected, but there is another case of that, ParmEd throws an exception:

Traceback (most recent call last):
  File "convert_amber.py", line 1040, in <module>
    main()
  File "convert_amber.py", line 102, in main
    convert_yaml(args.input, ffxml_dir='ffxml/')
  File "convert_amber.py", line 378, in convert_yaml
    filter_warnings=filter_warnings, split_filename=True)
  File "convert_amber.py", line 202, in convert_leaprc
    params = parmed.amber.AmberParameterSet.from_leaprc(leaprc)
  File "/Users/rafalpwiewiora/anaconda3/lib/python3.5/site-packages/ParmEd-2.7.3+223.g77b3676-py3.5-macosx-10.6-x86_64.egg/parmed/amber/parameters.py", line 294, in from_leaprc
    params.load_parameters(_find_amber_file(fname, search_oldff))
  File "/Users/rafalpwiewiora/anaconda3/lib/python3.5/site-packages/ParmEd-2.7.3+223.g77b3676-py3.5-macosx-10.6-x86_64.egg/parmed/amber/parameters.py", line 390, in load_parameters
    return self._parse_parm_dat(f, line)
  File "/Users/rafalpwiewiora/anaconda3/lib/python3.5/site-packages/ParmEd-2.7.3+223.g77b3676-py3.5-macosx-10.6-x86_64.egg/parmed/amber/parameters.py", line 488, in _parse_parm_dat
    key = self._process_dihedral_line(line, finished_diheds, key)
  File "/Users/rafalpwiewiora/anaconda3/lib/python3.5/site-packages/ParmEd-2.7.3+223.g77b3676-py3.5-macosx-10.6-x86_64.egg/parmed/amber/parameters.py", line 652, in _process_dihedral_line
    ParameterWarning)
parmed.exceptions.ParameterWarning: Expecting next term in dihedral ('OH', 'CT', 'CT', 'OA'), got definition for dihedral ('OA', 'CT', 'CT', 'OA')

indeed if you look at parm14ipq.dat these two dihedrals have periodicities mixed up (line 1885):

OH-CT-CT-OA   1     0.14400    0.0 -3.0  Branched from OH-CT-CT-OH 
OA-CT-CT-OA   1     0.14400    0.0 -3.0  Branched from OH-CT-CT-OH 
OH-CT-CT-OA   1     1.17500    0.0  2.0  Branched from OH-CT-CT-OH 
OA-CT-CT-OA   1     1.17500    0.0  2.0  Branched from OH-CT-CT-OH

2) Now ff15ipq / ff15ipq-vac: Conversion goes ok - meaning no problems like above, but the energy validation fails for proper torsions for villin - I have a threshold of relative energy difference of 1e-5, it comes at 2.03e-5. Using the same methodology as in https://github.com/choderalab/openmm/issues/4 - i.e. make Parmed ParameterSet from the tleap prmtop, write out as OpenMM ffxml and compare the proper torsions that were assigned by tleap to the torsions in the ffxml. I find two torsions with a mismatch between assigned and expected k:

(line 1: tleap-assigned atom types, line2: tleap-assigned k, line3, line4 - OpenMM ffxml atom types and k)

(['NB'], ['CC'], ['CW'], ['NA'])
[21.67282712]
({'H', 'CO', 'TG', 'N3', 'TA', 'CA', 'O', 'N', 'H4', 'N2', 'OA', 'C', 'HA', 'TH', 'C*', 'HS', 'OD', 'CC', 'CV', 'HP', '2C', 'TP', 'CN', 'HO', 'S', 'CB', 'HC', 'H5', 'NL', 'O3', 'H1', 'CR', 'TN', 'SH', 'CT', 'ND', 'TJ', 'NA', 'C8', 'TM', '3C', 'CX', 'CW', 'NB', 'OH'}, ['CC'], ['CW'], ['NA'])
[21.91332344]

(['TA'], ['CC'], ['CW'], ['NA'])
[21.67282712]
({'H', 'CO', 'TG', 'N3', 'TA', 'CA', 'O', 'N', 'H4', 'N2', 'OA', 'C', 'HA', 'TH', 'C*', 'HS', 'OD', 'CC', 'CV', 'HP', '2C', 'TP', 'CN', 'HO', 'S', 'CB', 'HC', 'H5', 'NL', 'O3', 'H1', 'CR', 'TN', 'SH', 'CT', 'ND', 'TJ', 'NA', 'C8', 'TM', '3C', 'CX', 'CW', 'NB', 'OH'}, ['CC'], ['CW'], ['NA'])
[21.91332344]

To compare to parm15ipq_10.3.dat I can also dump an frcmod from the villin prmtop and looking at the torsions in question:

NB-CC-CW-NA    1     5.17993000  180.000   2.0    SCEE=1.2 SCNB=2.0
TA-CC-CW-NA    1     5.17993000  180.000   2.0    SCEE=1.2 SCNB=2.0

comparing to parm15ipq_10.3.dat:

X -CC-CW-NA   1     5.23741  180.0  2.0  HIE/HIP chi3(2), Fit by mdgx

indeed we have a WRONG k and the only torsion in the parm15ipq_10.3.dat that has the wrongly-assigned k=5.17993 is:

X -CC-CW-H4   1     5.17993  180.0  2.0  HIE/HIP side chain, Fit by mdgx

i.e. it appears that somehow tleap mistakes the NA atom type for H4. I have no idea how this can happen...

Tagging @peastman @swails

swails commented 6 years ago

My best guess is that tleap hard codes the assumption that there are either 0 or 2 wild-cards and if the first atom is a wild-card, it assumes the last one is too.

You’d have to look at the leap code to confirm, but this is my guess (and would explain why it was never a problem before this force field.

dscerutti commented 6 years ago

Thanks for looking into this, Rafal.. I recall having discussions with Karl Debiec @KarlTDebiec on in this subject. He did a pretty good analysis, but I think he also realized that there was some problem in those Histidine rings with atom typing and parameter assignment. My opinion at the time was probably that the rings are not critical and the parameters differ too little to make a difference in the behavior. So, you may very well have a situation where OpenMM is doing the force field explicitly as it was fitted, but Amber is doing the force field slightly wrong. Hopefully Karl will have a moment to comment on this; I'll give it some thought but the solution is probably to explicitly include those two torsions in the parameter file.

But, again, you are probably making a translation of the force field as mdgx would have fitted it. Jason's hunch is probably correct--I didn't make any assumptions about wildcards coming (first and last) or (not at all) when I coded the parsing and parameter matching.

rafwiewiora commented 6 years ago

Thanks @dscerutti! In this case, @jchodera @peastman I suggest we raise the relative error tolerance to 3e-5 to allow the ff15ipq to pass? Or we can leave it at 1e-5 (seeing that allowed me to find the problem in this forcefield) and not test ff15ipq?

peastman commented 6 years ago

This isn't really about the tolerance. If OpenMM were making an error of magnitude 3e-5, it would be important to know that. But if it's Amber that has the error and OpenMM's result is correct, then it's fine regardless of what the magnitude of the difference is. We just need to make sure that's really what's happening here.

rafwiewiora commented 6 years ago

Ok - what else should we try to prove that?

peastman commented 6 years ago

Can we verify that there's really a bug in tleap? Or perhaps compare to results from mdgx?

swails commented 6 years ago

But if it's Amber that has the error and OpenMM's result is correct, then it's fine regardless of what the magnitude of the difference is.

Amber is the reference implementation for this force field so is, by definition, correct. Every published result of this FF has been done with Amber's implementation of how parameters are assigned. If the issue lives inside Amber (my guess is that it does not support single-wildcard torsion fitting), then it's unfortunately a feature rather than a bug (and one that's missing in the OpenMM conversion scripts).

The best fix in this case would probably be to change the ff15ipq FF files in the Amber distribution to eliminate single-wildcard torsions.

By the way, here is the offending code in tleap

What happens here is that if the first atom type is a wild-card, it only matches the middle two atoms (which is just a hard-coded assumption that if the first atom is a wild-card, so is the fourth).

So the options here are:

"Fix" tleap so it handles generic torsions with only 1 wild-card
"Fix" mdgx so it will not try to optimize torsion families with only 1 wild-card (or the parameter file to eliminate single-wild-card torsion definitions)
"Fix" ParmEd so it will read Amber parameter files and treat single-wild-card torsions as double-wild-card torsions and emit a warning while doing so.

I do not have time to do 3.

peastman commented 6 years ago

I have to disagree: if the force field was created and parametrized using mdgx, then that is the authoritative implementation and the definition of what is correct. The fact that a lot of people have published papers using an incorrect implementation is unfortunate, but that doesn't make it correct.

dscerutti commented 6 years ago

Yes, the intention with the force field is as mdgx designed it. So what we'll have to do is 4.) adapt ff15ipq files so that these offending torsions do not appear as wildcards. I'm still hoping to get a reply from Karl, as I'm certain this was something he brought up earlier.

dscerutti commented 6 years ago

I haven't gotten word from Karl, but I have an idea of how to fix this. What I am doing is dusting off the pile of fitting data, and from that I will generate topologies for each system (Ala-Ala, Ala-Thr, Gly-Gly-Gly, etc.) with ff15ipq as it stands and then with the new edit to parm15ipq_10.3.dat. I'll have mdgx score the structures and topologies against the fitting data structures (about 250,000 of them), no parameters free (all taken from the ff15ipq force field as it stands), and we should see the original data fitting results recovered nicely. What this will also do is list out for me all of the systems in which each dihedral--in particular, the ones with only one wildcard--appears. Then I'll know all of the instances that each wildcard needs to take, so I can go and make explicit parameters for those cases.

I'll then rebuild the topologies with my new parm15ipq_10.3.dat, and the fitting results should not change. What this will do is scan over all systems and make sure that I can replace the problematic dihedrals and still cover all the amino acids ff15ipq is supposed to. A little more patience, please.

dscerutti commented 6 years ago

Try this version of the parameter file out, Rafal. As I said, I used mdgx to ascertain which torsion parameters the wildcards were getting substituted in for. It's still a bit tricky, but I hope I've done it right. If you can perhaps compare the results, topologies built by Amber tleap and your OpenMM software (following conversion of this software should now match, AND the energies you get from those files should match what you were getting earlier with the topologies created by OpenMM.

parm15ipq_10.3.zip

If this works out I will enter this into the Amber repository and possibly patch the release--but as you observed, 2.0e-5 is a very small deviation, and this largely evaded our own tests because most energies from Amber are only reported to four decimal places.

jchodera commented 6 years ago

@rafwiewiora : Can you give this a try?

dscerutti commented 6 years ago

Did the new parameter file work out for you all? If there's any other issue I can take a look today; would be nice to have this in OpenMM.

rafwiewiora commented 6 years ago

@dscerutti thanks for the file, this converts no problem!

Will you patch the AMBER release? I'm not quite sure how to deal with the need to swap the file for an updated version on our end @jchodera @peastman ? Can we just swap it in ambermini?

dscerutti commented 6 years ago

Great to hear this works. I'll patch the release as well, but it may be after the Thanksgiving holiday. I'm also looking to upgrade the Amber webpage shortly, and publish some tutorials I made on the use of mdgx force field tools that I used to make ff15ipq, and prove to myself that the fix worked.

jchodera commented 6 years ago

I think the timing of the AmberTools update means we will likely not distribute ff15ipq with the OpenMM 7.2 release, but will update the openmm-forcefields plugin with ff15ipq once AmberTools is updated.

peastman commented 6 years ago

That depends on the timing. If @jchodera can get all the force fields finished up next week, we can release the beta the week after that. Then the official release would follow about a month later. The Amber patch should be out before we officially release this, but it's OK to include it in the beta before that. If you find problems in it, or aren't able to fully test it in time, we can remove it from the release.

jchodera commented 6 years ago

That sounds like a great plan. I just want to avoid the situation where (1) we include a version that ends up changing in the official AmberTools release or (2) we have to manually hack in the file instead of using the distributed AmberTools files.

dscerutti commented 6 years ago

I'll see what I can do in the next few days; going into this I didn't know your schedule for releasing a new version of OpenMM but now I'm better up to speed.

dscerutti commented 6 years ago

Working on a little more validation now; I will have the patch committed shortly.

jchodera commented 6 years ago

We've added the unreleased ff15ipq in https://github.com/choderalab/openmm-forcefields/pull/49.

This is just a reminder for @peastman to diff the released version with the version we used before release, and to regenerate if needed prior to OpenMM release.

openmm / openmmforcefields

Ensure AMBER ff15ipq included in build is same as final released version #44