michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Equilibrate in free energy protocol #220

Closed kexul closed 3 years ago

kexul commented 3 years ago

Hi, I found that the equilibrate = False in somd.cfg generated by default free energy protocol, and there is no parameter in Protocol.FreeEnergy to alter it.

I'm new to somd and I was using Gromacs earlier, I used to believe that we need to do NVT and NPT equilibration before Production.

Is the system generated by BioSimSpace (i.e. somd.prm7 and somd.rst7) already equilibrated or we do not need to equilibrate the system when using somd?

lohedges commented 3 years ago

Hi @kexul. The equilibrate option for somd-freenrgy doesn't perform a standard molecular dynamics equilibration, rather performs an equlibrate-to-lambda stage where the perturbation is slowly applied from zero to the chosen lambda value prior to running the main part of the simulation. I think @jmichel80 would have a better idea, but my understanding is that this was found to be unreliable in practice and it was easier to just discard the initial portion of the simulation output when post-processing the data to reconstruct free energies, hence why it is set to False. It also doesn't support constraints, etc. which would be expected for a regular MD equilibration.

You are indeed correct that it is good practice to perform equilibration protocols prior to your main simulation. We make no assumptions that the system passed to SOMD is equlibrated and it is up to the user to make sure that it is. The way that BioSimSpace has been written means that you can use all protocols with a perturbable system. Under the hood we extract the lambda=0 state of the system, run the simulation protocol, then copy updated coordinates and box information back into the system when you call .getSystem() on a Process object. (Since we now support read/writing perturbable systems you could also setup the simulation within BioSiimSpace, run it externally, then reconstruct the merge system with the new coordnates.) If needed, you can also choose to use the lambda=1 system instead. Ideally we would also support intermediate values of lambda but the way in which the perturbation is applied isn't consistent between the engines.

The tutorial execution model that @jmichel80 pointed you to in an other issue thread does things slightly differently. Here it minimises and equilibrates the subcomponents of the merged system separately, e.g. the protein plus one ligand in solvent, then extracts the components and re-merges to form the final system. This was needed since we didn't support read/write of perturbable systems at the time of writing and makes life simpler when running sub-stages of the equlibration outside of BioSimSpace.

kexul commented 3 years ago

Thanks for the detailed information and example @lohedges , things are much clear now.

kexul commented 3 years ago

Hi @lohedges , I find the tutorial is quite complicated and I'm here looking for help again.

I would like to clarify the status quo with some code snippets, here is the code I've used earlier:

import BioSimSpace as _BSS

lig1 = ['CHEMBL92812.prmtop', 'CHEMBL92812.inpcrd']
lig2 = ['CHEMBL93461.prmtop', 'CHEMBL93461.inpcrd']
lig1 = _BSS.IO.readMolecules(lig1)[0]
lig2 = _BSS.IO.readMolecules(lig2)[0]
pert = _BSS.Align.merge(lig1, lig2)

pro = _BSS.IO.readMolecules('pro.*')

sys = pert + pro

solvated_sys = _BSS.Solvent.tip3p(sys, box=3*[8*_BSS.Units.Length.nanometer], work_dir='solvation')
free_energy_protocol = _BSS.Protocol.FreeEnergy(num_lam=3)
_BSS.FreeEnergy.Binding(solvated_sys, free_energy_protocol, work_dir='free_energy', engine='SOMD', setup_only=True, ignore_warnings=True)
# Then we could collect the files generated in lambda_* and distribution them to HPC, run simulation with somd-freenrg

We've agreed it's better to equilibrate the system before main simulation,

The way that BioSimSpace has been written means that you can use all protocols with a perturbable system. Under the hood we extract the lambda=0 state of the system, run the simulation protocol, then copy updated coordinates and box information back into the system when you call .getSystem() on a Process object.

according to your words, I wrote the following code:

import BioSimSpace as _BSS

lig1 = ['CHEMBL92812.prmtop', 'CHEMBL92812.inpcrd']
lig2 = ['CHEMBL93461.prmtop', 'CHEMBL93461.inpcrd']
lig1 = _BSS.IO.readMolecules(lig1)[0]
lig2 = _BSS.IO.readMolecules(lig2)[0]
pert = _BSS.Align.merge(lig1, lig2)

pro = _BSS.IO.readMolecules('pro.*')

sys = pert + pro

solvated_sys = _BSS.Solvent.tip3p(sys, box=3*[8*_BSS.Units.Length.nanometer], work_dir='solvation')

minimise_protocol = _BSS.Protocol.Minimisation()
minimised_proc = _BSS.Process.Amber(solvated_sys, minimise_protocol, work_dir='minimization')
minimised_proc.start()
minimised_proc.wait()
minimised_sys = minimised_proc.getSystem()

equilibration_protocol = _BSS.Protocol.Equilibration()
equilibrated_proc = _BSS.Process.Amber(minimised_sys , equilibration_protocol, work_dir='equilibration')
equilibrated_proc.start()
equilibrated_proc.wait()
equilibrated_sys = equilibrated_proc.getSystem()

free_energy_protocol = _BSS.Protocol.FreeEnergy(num_lam=3)
sys = _BSS.FreeEnergy.Binding(equilibrated_sys, free_energy_protocol, work_dir='free_energy', engine='SOMD', setup_only=True, ignore_warnings=True)

# Then we could collect the files generated in lambda_* and distribution them to HPC, run simulation with somd-freenrg

Would these work as you expected?

kexul commented 3 years ago

Here is the input files I've used and the code stopped in minimization step, sander said: Error in numextra_test. pert.zip

lohedges commented 3 years ago

Hi there. I think there must be a bug in the Sire.IO.AmberPrm parser that isn't correctly determining the number of virtual sites. This was a previous issue for TIP4/5P water topologies that I fixed, but I imagine that it's still not correctly account for non-water dummy atoms. When I get a chance I'll see if I can figure out what's going wrong.

In the mean time, could you try using a different MD engine, e.g. GROMACS. If you are just using sander then the equilibration would be painfully slow anyway.

lohedges commented 3 years ago

(Unrelated to your numextra issue). Just to note that in your script above you are using solvated_sys twice, rather than passed minimised_sys to the equlibration, so it would likely fail anyway.

lohedges commented 3 years ago

This issue is because we're using du as the AMBER atom type for dummy atoms, which isn't defined anywhere. Replacing this with a known name, such as EP (which is used for TIP4P water) works. However, this is probably undesirable.

Perhaps @jmichel80 might know the best solution here? This isn't a zero-mass dummy atom so it shouldn't cause problems with the AMBER integrator.

kexul commented 3 years ago

Wow, ultrafast support as always! Thanks so much!

(Unrelated to your numextra issue). Just to note that in your script above you are using solvated_sys twice, rather than passed minimised_sys to the equlibration, so it would likely fail anyway.

Thanks for pointing that out, I've corrected the typo.

In the mean time, could you try using a different MD engine, e.g. GROMACS. If you are just using sander then the equilibration would be painfully slow anyway.

Yes! Gromacs worked. Thank BioSimSpace so that we could switch between different engine painlessly. 🎉

kexul commented 3 years ago

I've tested other MD engine, OpenMM and SOMD did not work either.

Output of OpenMM:

/root/group_ceph/FEP/prepare/BioSimSpace/Process/_process.py:606: UserWarning: The system contains a perturbable molecule .We will assume that you intend to simulate the lambda = 0 state. If you want to simulate the lambda = 1 state, then pass {'is_lambda1' : True} in the 'property_map' argument.
  _warnings.warn("The system contains a perturbable molecule ."
Traceback (most recent call last):
  File "/root/group_ceph/FEP/prepare/test_case/git/run.py", line 16, in <module>
    minimised_proc = _BSS.Process.OpenMM(solvated_sys, minimise_protocol, work_dir='minimization')
  File "/root/group_ceph/FEP/prepare/BioSimSpace/Process/_openmm.py", line 172, in __init__
    self._setup()
  File "/root/group_ceph/FEP/prepare/BioSimSpace/Process/_openmm.py", line 223, in _setup
    raise IOError(msg) from None
OSError: Failed to write system to 'PRM7' format.

Output of SOMD in somd.out in minimization step:

Traceback (most recent call last):
  File "/data/miniconda3/envs/uii/share/Sire/scripts/somd.py", line 125, in <module>
    OpenMMMD.run(params)
  File "/data/miniconda3/envs/uii/lib/python3.7/site-packages/Sire/Tools/__init__.py", line 176, in inner
    retval = func()
  File "/data/miniconda3/envs/uii/lib/python3.7/site-packages/Sire/Tools/OpenMMMD.py", line 1504, in run
    system = integrator.minimiseEnergy(system, minimise_tol.val, minimise_max_iter.val)
RuntimeError: Particle coordinate is nan
lohedges commented 3 years ago

Thanks, I'll take a look. The OpenMM error is strange given that it writing the same AMBER files as Process.Amber and Process.Somd would.

lohedges commented 3 years ago

Okay, fixed the OpenMM issue. Now it reports the same nan error as SOMD, which is what I'd expect, since SOMD uses OpenMM behind the scenes. I've found that OpenMM can be a lot more sensitive to bad contacts, or poorly minimised / equilibrated structures. I imagine that you might be able to minimise with something else, e.g. GROMACS, then equilibrate with OpenMM afterwards.

jmichel80 commented 3 years ago

Hi @kexul would you be happy to email me directly ? I think it could be useful to have a separate discussion about your ongoing work with BioSimSpace.

kexul commented 3 years ago

I imagine that you might be able to minimise with something else, e.g. GROMACS, then equilibrate with OpenMM afterwards.

Thanks for the quick fix @lohedges . 🎉

Hi @kexul would you be happy to email me directly ? I think it could be useful to have a separate discussion about your ongoing work with BioSimSpace.

Of course @jmichel80, I'll email you directly then.