michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Equilibration fails with TIP4/5P #171

Closed jthorton closed 3 years ago

jthorton commented 3 years ago

Expected Behavior

When running the attached code it fails for a system solvated with a water model with a virtual site but works fine for one without.

Current Behavior

The following error is produced when the script is run on a system with a virtual site water model

Failure Information

Traceback (most recent call last): File "amberequilibration.py", line 172, in process = BSS.Process.Amber(nvt_equilibrated, protocol, name="equilibrate-npt") File "/home/nvk28/Documents/Work/smirnoff/miniconda3/envs/fep/lib/python3.7/site-packages/BioSimSpace/Process/_amber.py", line 170, in init super().init(system, protocol, name, work_dir, seed, property_map) File "/home/nvk28/Documents/Work/smirnoff/miniconda3/envs/fep/lib/python3.7/site-packages/BioSimSpace/Process/_process.py", line 95, in init raise TypeError("'system' must be of type 'BioSimSpace._SireWrappers.System'") TypeError: 'system' must be of type 'BioSimSpace._SireWrappers.System'

Steps to Reproduce

Please provide detailed steps for reproducing the issue. Make use of markdown code blocks, e.g., if BioSimSpace failed to parameterise a molecule:

import BioSimSpace as BSS

node = BSS.Gateway.Node("A node to equilibrate a solvated molecule.")
node.addInput("input", BSS.Gateway.FileSet(help="Topology and coordinate files describing the solvated system"))
node.addInput("output", BSS.Gateway.String(help="The root name of the output files."))
node.addOutput("nodeoutput", BSS.Gateway.FileSet(help="The parameterised and solvated molecular system in AMBER format."))
node.showControls()
system = BSS.IO.readMolecules(node.getInput("input"))
protocol = BSS.Protocol.Minimisation(steps=1000)
process = BSS.Process.Amber(system, protocol, name="minimise")
process.getConfig()
process.start()
process.wait()
minimised = process.getSystem()
protocol = BSS.Protocol.Equilibration(runtime=BSS.Types.Time(0.01, "nanosecond"),
                                      temperature_start=BSS.Types.Temperature(0, "kelvin"), 
                                      temperature_end=BSS.Types.Temperature(300, "kelvin"), 
                                      restrain_backbone=True)
process = BSS.Process.Amber(minimised, protocol, name="equilibrate-nvt")
process.getConfig()
process.start()
temps = process.getTemperature(time_series=True)
nvt_equilibrated = process.getSystem()
protocol = BSS.Protocol.Equilibration(runtime=BSS.Types.Time(0.01, "nanosecond"),
                                      temperature=BSS.Types.Temperature(300, "kelvin"), 
                                      ensemble="NPT")
process = BSS.Process.Amber(nvt_equilibrated, protocol, name="equilibrate-npt")

Context

Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.

SofiaBariami commented 3 years ago

Hi josh, when I tried to reproduce your error, I got another one:

Exception 'SireError::incompatible_error' thrown by the thread 'master'.
Cannot construct a System from multiple lead parsers if none can follow!
Thrown from FILE: /home/sireuser/Sire/corelib/src/libs/SireIO/moleculeparser.cpp, LINE: 1679, FUNCTION: void SireIO::MoleculeParser::sortParsers(QList<SireBase::PropPtr<SireIO::MoleculeParser> >&, QList<SireBase::PropPtr<SireIO::MoleculeParser> >&) const

As Danny suggested, try tip3p to check if it's a water-model issue and let us know if you had any issues with that.

jmichel80 commented 3 years ago

Hi @jthorton please use tip3p for the time being. This looks like a parsing issue @lohedges could investigate after the Christmas holidays.

lohedges commented 3 years ago

Hi...

As @jmichel80 says, I am now on leave and can take a more detailed look at issues when I return in the New Year.

Briefly, @SofiaBariami: I don't think that this is an IO problem, since the files load fine for me. I can only assume that you typed the name of the prm7 file twice when running the script.

@jthorton: You state that you are using the 2019.1 release. This is rather out of date. Could you try using the latest development version following the instructions in the README. You will need to update your script to pass a pressure parameter to the equilibration Protocol. (NVT is used if no pressure is set.)

When debugging it is useful to save the process output in a working directory that you can access, e.g. when creating the NVT equilibration process you could do:

process = BSS.Process.Amber(minimised, protocol, name="equilibrate-nvt", work_dir="equilibrate-nvt")

Looking at the equlibrate-nvt/equilibrate-nvt.out log file I see the following error:

vlimit exceeded for step      0; vmax = **********

     Coordinate resetting (SHAKE) cannot be accomplished,
     deviation is too large
     NITER, NIT, LL, I and J are :      0      0     18     33     35

     Note: This is usually a symptom of some deeper
     problem with the energetics of the system.

I assume that the TIP4P system isn't sufficiently well equilibrated. If further minimisation still doesn't help, it's possible that we might need to tweak some of the configuration parameters for water models with virtual sites.

I also note the following in your script:

nvt_equilibrated = process.getSystem()

This means that you are not wating for the process to finish before returning the system, which could return a NoneType if a restart file hasn't yet been written. To first wait for the process to finish, you can either use process.wait() (as you have done earlier in the script) or use process.getSystem(block=True).

Cheers.

lohedges commented 3 years ago

Just a quick update on this issue. I'm making progress but don't think I'll be able to fix it until some point next week.

The first issue is that we need to convert the water model naming to AMBER format for all AMBER simulations, regardless of what file types were used to create the system. At present I don't do this if they were AMBER files, but this isn't working since what we actually have is a GROMACS system that has been converted to AMBER format.

Updating the code so that it always converts to an AMBER format water model gets further, but triggers another issue:

 Error in numextra_test

(See this related issue.)

Basically, the Sire.IO.AmberPrm doesn't create the correct NUMEXTRA flag when we have a water model with virtual sites. (It is zero regardless.) I've tried to enter the value manually but I still get the same error. I believe the flag should just be the number of virtual sites, i.e. since we have 577 water molecules there should be 577 dummy atoms. (For reference, ParmEd just counts the number of dummy atoms to create the flag. See here.) Perhaps I'm adjusting the wrong flag. (According to this it should be the second last flag.) I've grepped the entire Sire source directory and there are no references to NUMEXTRA, so clearly it isn't being handled at all.

As another test I tried running the minimisation and equilibration with GROMACS instead. Here I get an error when trying to perform the equilibration:

RuntimeError: Unable to generate GROMACS restraint file.

This is presumably because the atom naming is such that we don't find any backbone atoms.

Setting restrain_backbone to False triggers another error:

Fatal error:
Step 0: The total potential energy is 293934, which is extremely high. The LJ
and electrostatic contributions to the energy are 1.32481e+06 and
-1.04537e+06, respectively. A very high potential energy can be caused by
overlapping interactions in bonded interactions or very large coordinate
values. Usually this is caused by a badly- or non-equilibrated initial
configuration, incorrect interactions or parameters in the topology.

This suggests that something might be wrong with the tip4p toploogy written by Sire, or that we aren't using sensible configuration parameters.

I'll update when I've had a chance to debug further.

lohedges commented 3 years ago

Ah, I needed to updated the NUMEXTRA flag in the topology file within the run directory, since Sire always writes zero. Whoops. After doing this I get the following error with AMBER:

 Error: A residue defined as a "fast 3-point water"
        is not defined by a triangle of three bonds.
        Residue        2 contains        2 bonds.

This can be solved by adding jfastw=4 to the configuration, which turns off SHAKE. I"ll make sure to add this when using waters with virtual sites. We will probably need something similar for GROMACS too.

lohedges commented 3 years ago

This looks like an issue of AMBER water residues being names WAT. For some reason it assumes that they are three-point if this is the case. Renaming them to anything other than WAT works with SHAKE. I can add this fix into Sire.IO.setAmberWater, but this would then break SOMD which requires that waters are named WAT regardless of the water model. (Refer to this previous issue.)

Feels like this is turning into a real mess. Enough for today, will mull over and come back to it next week.

lohedges commented 3 years ago

Okay, I've fixed the Sire.IO.AmberPrm parser so that the NUMEXTRA pointer is added on write. This just sums the number of dummy atoms using:

int num_dummies = system.search("element Xx").count()

However, using the input provided here (or any system solvated with BioSimSpace) I'm still getting the error:

 Error: A residue defined as a "fast 3-point water"
        is not defined by a triangle of three bonds.
        Residue        2 contains        2 bonds.

As mentioned above, this can be solved by adding jfastw=4 to the configuration. This shouldn't be necessary, though, since TIP4P is a rigid water model and can work with SHAKE.

As a check, I downloaded the example TIP4P AMBER input files from this tutorial. These inputs work with the AMBER config files provided and, more importantly, they also work with the configuration files written by the unmodified BioSimSpace protocols, i.e. using rigid water and SHAKE.

Ignoring the fact that Sire can't read the tutorial files correctly (it reads all elements as dummies), I've compared the water TIP4P water topologies to try to figure out what the difference is. (This is after swapping the MOL_sol.prm7 water toplogy with an AMBER one behind the scenes.)

import BioSimSpace as BSS

# Load the two molecular systems.
s0 = BSS.IO.readMolecules(["MOL_sol.prm7", "MOL_sol.rst7"])
s1 = BSS.IO.readMolecules(["bpti-ff99SBi.prmtop", "bpti-ff99SBi.inpcrd"])

# Get two water molecules. (The final molecule is water.)
w0 = s0[-1]
w1 = s1[-1]

# Print the residues in each.
print(w0.getResidues())
[<BioSimSpace.Residue: name='WAT', molecule=588, index=0, nAtoms=4>]

print(w1.getResidues())
[<BioSimSpace.Residue: name='WAT', molecule=4626, index=0, nAtoms=4>]

# Print the atoms in both:
 for atom0, atom1 in zip(w0.getAtoms(), w1.getAtoms()):
    print(atom0, atom1)

<BioSimSpace.Atom: name='O', molecule=588 index=0>
<BioSimSpace.Atom: name='O', molecule=4626 index=0>

<BioSimSpace.Atom: name='H1', molecule=588 index=1>
<BioSimSpace.Atom: name='H1', molecule=4626 index=1>

<BioSimSpace.Atom: name='H2', molecule=588 index=2>
<BioSimSpace.Atom: name='H2', molecule=4626 index=2>

<BioSimSpace.Atom: name='EPW', molecule=588 index=3>
<BioSimSpace.Atom: name='EPW', molecule=4626 index=3>

# Print LJ terms:
for atom0, atom1 in zip(w0.getAtoms(), w1.getAtoms()):
    print(atom0._sire_object.property("LJ"))
    print(atom1._sire_object.property("LJ"))
    print()

LJ( sigma = 3.16435 A, epsilon = 0.16275 kcal mol-1 )
LJ( sigma = 3.16435 A, epsilon = 0.16275 kcal mol-1 )

LJ( sigma = 0 A, epsilon = 0 kcal mol-1 )
LJ( sigma = 0 A, epsilon = 0 kcal mol-1 )

LJ( sigma = 0 A, epsilon = 0 kcal mol-1 )
LJ( sigma = 0 A, epsilon = 0 kcal mol-1 )

LJ( sigma = 0 A, epsilon = 0 kcal mol-1 )
LJ( sigma = 0 A, epsilon = 0 kcal mol-1 )

# Print charges.
for atom0, atom1 in zip(w0.getAtoms(), w1.getAtoms()):
    print(atom0._sire_object.property("charge"))
    print(atom1._sire_object.property("charge"))
    print()

0 |e|
0 |e|

0.52 |e|
0.52422 |e|

0.52 |e|
0.52422 |e|

-1.04 |e|
-1.04844 |e|

# Print the mass.
for atom0, atom1 in zip(w0.getAtoms(), w1.getAtoms()):
    print(atom0._sire_object.property("mass"))
    print(atom1._sire_object.property("mass"))
    print()

16 g mol-1
16 g mol-1

1.008 g mol-1
1.008 g mol-1

1.008 g mol-1
1.008 g mol-1

0 g mol-1
0 g mol-1

# Print bonds in both:
for b0, b1 in zip(w0._sire_object.property("bond").potentials(),
                          w1._sire_object.property("bond").potentials()):
    print(b0)
    print(b1)
    print()

TwoAtomFunction( {CGIdx(0),Index(1)} <-> {CGIdx(0),Index(2)} : 553 [r - 1.5136]^2 )
TwoAtomFunction( {CGIdx(0),Index(1)} <-> {CGIdx(0),Index(2)} : 553 [r - 1.5136]^2 )

TwoAtomFunction( {CGIdx(0),Index(0)} <-> {CGIdx(0),Index(2)} : 553 [r - 0.9572]^2 )
TwoAtomFunction( {CGIdx(0),Index(0)} <-> {CGIdx(0),Index(2)} : 553 [r - 0.9572]^2 )

TwoAtomFunction( {CGIdx(0),Index(0)} <-> {CGIdx(0),Index(3)} : 553 [r - 0.125]^2 )
TwoAtomFunction( {CGIdx(0),Index(0)} <-> {CGIdx(0),Index(3)} : 553 [r - 0.125]^2 )

TwoAtomFunction( {CGIdx(0),Index(0)} <-> {CGIdx(0),Index(1)} : 553 [r - 0.9572]^2 )
TwoAtomFunction( {CGIdx(0),Index(0)} <-> {CGIdx(0),Index(1)} : 553 [r - 0.9572]^2 )

As you can see, other than some slight difference in the charge terms, they are identical! As such, I've no idea why AMBER is treating them differently. I don't see anything obvious that is different in the topology file. Both have the water molecules listed at the end. Perhaps there is some additional FLAG or POINTER that I'm unaware of.

Help from any AMBER experts would be much appreciated.

lohedges commented 3 years ago

Reading the problematic prm7 file with ParmEd then writing it back to AMBER prm format works, so there must be something incorrect in the topology file. I'll make a comparison to figure out what's going wrong.

lohedges commented 3 years ago

Here are the FLAG POINTERS records from both files:

Sire:

    2360       5    2354       6      12       6      18       6       0       0
    4161     588       6       6       6       5       2       1      15       0
       0       0       0       0       0       0       0       1      12       0
     587       0       0

ParmEd:

    2360       5    1767     593      12       6      18       6       0       0
    4161     588     593       6       6       5       2       1      15       0
       0       0       0       0       0       0       0       1      12       0
     587       0       0

The third and fourth pointers give the number of bonds containing hydrogen, NBONH, and those without, MBONA. The thirteenth pointer is NBONA, which specifies MBONA plus the number of constraint bonds. These are the difference between the records. I assume that ParmEd is doing something different for bonds involving virtual atoms, and this is what AMBER expects.

Reading the file produced by ParmEd back with Sire gives the same molecular properties as the original system, so we are clearly not interpreting the topology information correctly. I'll look at the ParmEd source code to try to figure out what's going wrong.

lohedges commented 3 years ago

The problem is that Sire is incorrectly labeling bonds between the O and EPW (dummy) atoms in the water molecule as a bond containing hydrogen, so the records are incorrect. Perhaps it is just assuming all water molecule bonds contain hydrogen, which is not a case.

I'll raise an issue on the Sire repo and try to come up with a fix.

jthorton commented 3 years ago

Wow this turned out to be a lot more involved than I first assumed, thanks @lohedges for looking into this and the detailed feedback.

lohedges commented 3 years ago

@jthorton No problem, this is definitely something that needed fixing. Thanks again for raising the issue and for the clear description of the problem and for providing supporting files. I've now fixed the issue in Sire (see the linked issue and commit above) and can now run your example with the existing BioSimSpace protocols. If you're working with the conda version of BioSimSpace, then you'll need to wait for the updated Sire package to finish building then run an update within your environment.

Let me know if you run into any further issues and I'll be happy to help. I haven't run any simulations using TIP4/5P (other than with files prepared outside of BioSimSpace) so there may be further issues that I'm unaware of.

Cheers.