openforcefield / discussions

Contains organisation-wide discussions relating to science, infrastructure, etc.
MIT License
0 stars 0 forks source link

Issue with coordinates after equilibration #15

Closed yabmtm closed 1 year ago

yabmtm commented 3 years ago

Description I have recently ported over my workflow to use openff.toolkit instead of the old openforcefield module. This is meant to take in a small molecule mol2 file and a receptor pdb file, parameterize them, minimize, equilibrate, and export to Gromacs (gro/top).

The old version (openforcefield) still works just fine, but the new version (openff-toolkit) produces bad coordinate files after equilibration. Both directories can be found in the attached tarball: openff-issue.tar.gz

The resulting topology files look very similar, with seemingly minor changes to the ligand parameters, but the structure file (conf.gro) for the openff-toolkit script seems to explode after equilibration! Note that the ligand name "<0>" is NOT an issue and is present in both cases.

    1<0>     C1    1518214.8142654.199576.66
    1<0>     C2    2-374233.817505.7462426.0
    1<0>     C3    3545191.3-497387.72653.99
    1<0>     C4    4383505.2-211521.144098.6
    1<0>     C5    5-149680.-202533.374189.9
    1<0>     C6    6-577753.-201560.-223641.
    1<0>     C7    7290914.0-12141.3-333546.
    1<0>     C8    8-177329.563956.8-276489.
    1<0>     C9    9294280.5670759.4-407819.

Steps To Reproduce All files needed to re-produce including the requirements.txt for re-creating the exact conda environments used in both cases are included in their respective directories. Only necessary files for running each one are the ligand mol2 file and receptor pdb file, at which point the output files can be generated by running example.py.

Output No error message is shown. The issue is found in the output files after equilibration as noted above.

Computing environment (please complete the following information):

mattwthompson commented 3 years ago

Hi @yabmtm,

Thanks for the thorough and detailed report! I'm able to get both examples running on my machine with some minor modifications (CUDA -> CPU, since I don't have an NVIDIA GPU; OpenBabel -> OpenEye, since I get segfaults with OpenBabel and I don't want to investigate them). One thing I noticed was that the openforcefield example was using a fairly old version of the toolkit (0.6.0) - if you clone the openforcefield environment and upgrade to 0.8.3, do you get a stable structure or does it also blow up? There have been tons of changes, large and small, between these versions and it's hard to pick one that could be the issue. The namespace change so far hasn't caused behavior changes like this, so I'd expect 0.8.x to have similar behavior to 0.9.x.

I'm less in tune with openmmforcefields but I presume a lot changed in the past year and change there (0.7.1 to 0.9) as well. There are other differences (RDKit, AmberTools?) that also may or may not have an impact. I did notice the OpenMM version is the same, so it's probably not that.

I don't have OpenMM set up to use my GPU, so this runs pretty slowly for me and I haven't been able to isolate the issue yet. Let us know how 0.8.3 works and that can help us pin down the issue.

yabmtm commented 3 years ago

Thanks for your response. Everything looks fine with openforcefield=0.8.3. I created a new environment also using openmmforcefields=0.7.4 (default with conda install) I also had to remove nonbondedMethod from forcefield_kwargs based on:

raise ValueError("""nonbondedMethod cannot be specified in forcefield_kwargs; ValueError: nonbondedMethod cannot be specified in forcefield_kwargs; must be specified in either periodic_forcefield_kwargs (if it should be applied to periodic systems) or nonperiodic_forcefield_kwargs (if it should be applied to non-periodic systems)

Output looks normal. I have attached a tarball of the working directory for this test including the full environment: openforcefield-0.8.3.tar.gz

mattwthompson commented 3 years ago

Thanks for testing out that case - it rules out a lot of potential causes. I'll look into this more tomorrow, it's not immediately clear what the problem is. There might be other differences in openmmforcefields 0.7.4 and 0.9.0, but nothing stands out to me from the release notes so that seems unlikely.

j-wags commented 3 years ago

Thanks for the awesome reproducing example, @yabmtm. I was able to reproduce this on my computer. It looks like the problem is that the old script sets a hydrogen mass of 3 amu, while the new one sets a hydrogen mass of 4 amu. Since the hydrogen mass is subtracted from its bonded heavy atom, it looked like the crashes were being caused by a methyl carbon having too much mass subtracted by its three hydrogens, leading to unstable dynamics. I'm able to run the equilibration using the openff-toolkit-produced system with a hydrogen mass of 3 amu instead of 4.

I don't know what best practices for hydrogen mass repartitioning are, but a 3 amu carbon simulated with a 4 fs timestep seems dangerous. I'm seeing this hydrogenMass=4 used in other repos, though, so maybe this is standard. It's also possible that the minimization for this system left a bit of a clash behind that exacerbated the problem, and that a more thorough minimization or gentler heating would resolve this.

yabmtm commented 3 years ago

Nice catch, @j-wags! I actually thought all my scripts used 4amu/4fs, but you're right. Using hydrogenMass=3amu on the openff-toolkit script produces normal output.. But so does using 4amu on the older openforcefield script..

I didn't realize that hydrogen masses are subtracted from their bonded heavy atom. Is that generally true across MD engines, e.g. Gromacs?

As long as the forces and integration are stable with 3amu/4fs I see no problem using that for our simulations, but it would still nice to be able to use the more widely accepted mass partitioning scheme of hydrogenMass=4 with a 4fs time-step.

mattwthompson commented 2 years ago

I didn't realize that hydrogen masses are subtracted from their bonded heavy atom. Is that generally true across MD engines, e.g. Gromacs?

I'm not steeped in HMR but I think that's the convention. (Of course there's no enforceable standard here, so it's hard to tell what is and is not followed.) I forget the physics behind it, but I vaguely recall there being a physical justification for it (and it seems intuitive to me, having not thought about it too deeply). The blind default value for HMR seems to be adjusting masses by 3 amu so that the hydrogens have 4 amu mass. Intuition aside, it doesn't appear to fundamentally be out of the norm to use these masses.

I'm not sure GROMACS has a one-step HMR command in its system preparation tools; I think everybody runs their own solution for that.

Amber does seem to have a program do this this, though. From some Amber manual floating around my Google searches:

The mass of the heavy atom that the hydrogen is attached to is adjusted so that the total mass remains the same.

I also noticed that your old and new tarballs use different force field versions. The notebook pulls openff-1.0.0-RC1.offxml and the script uses openff-1.1.0. There's a pretty good chance they are similarly bad with both (we have had issues with HMR in the past as it was not a key part of the fitting pipeline) but I wanted to bring attention to that difference. There's a chance that code differences between toolkit versions are responsible for this behavior, but I don't think that's intuitive. Using the same force field each time would help isolate any issue.