michellab / Sire

Sire Molecular Simulations Framework
http://siremol.org
GNU General Public License v3.0
95 stars 26 forks source link

SOMD failing to minimise and giving NaN #345

Closed mark-mackey-cresset closed 3 years ago

mark-mackey-cresset commented 3 years ago

Hi all,

With the attached files, running "sire_python /path/to/somd-freenrg.py -C Config.cfg" behaves rather oddly. In the output I get

==============================================================
Create the System...
Sending anonymous Sire usage statistics to http://siremol.org.
For more information, see http://siremol.org/analytics
To disable, set the environment variable 'SIRE_DONT_PHONEHOME' to 1
To see the information sent, set the environment variable 
SIRE_VERBOSE_PHONEHOME equal to 1. To silence this message, set
the environment variable SIRE_SILENT_PHONEHOME to 1.
==============================================================
Selecting dummy groups

Applying Hydrogen Mass repartition to input using a factor of 1.5 
Creating force fields... 
Setting up the simulation with random seed 991845
Setting up moves...
Created a MD move that uses OpenMM for all molecules on 0 
Generated random seed number 991845 
Saving restart
Setting up sim file. 
There are 14344 atoms in the group 
###===========================================================###

###=======================Minimisation========================###
Running minimisation.
Energy before the minimisation: 2652.15 kcal mol-1
Tolerance for minimisation: 1
Maximum number of minimisation iterations: 1000
Energy after the minimization: 2652.15 kcal mol-1
Energy minimization done.
###===========================================================###

###====================somd-freenrg run=======================###
Starting somd-freenrg run...
500 moves 10 cycles, 2.5 ps simulation time

Cycle =  1 

NaN or Inf has been generated along the simulation

The energy after the minimisation is identical to the energy before, which is a bit weird, and then the simulation crashes out as soon as it starts. I don't think there's anything wrong with the system itself as if I take the prm7/rst7 and just load them into OpenMM I can minimiza and then run a simulation on them just fine. Any pointers to the best way to debug this?

somd_NaN.tar.gz

lohedges commented 3 years ago

I can reproduce this locally. I tried running a minmisation with OpenMM (from within BioSimSpace) and, as stated, it apparently works. I then wrote the minimised system to file and used this as input to SOMD. However, this also blew up immediately.

I'll come back to this after the Easter break.

lohedges commented 3 years ago

Actually, having run a longer minimisation (10000 steps) with OpenMM and again using the minimised system as input for SOMD it now works!

### Running Single Topology Molecular Dynamics Free Energy on unknown ###
###================Setting up calculation=====================###
New run. Loading input and creating restart
lambda is 0.0
Create the System...
Selecting dummy groups
Applying Hydrogen Mass repartition to input using a factor of 1.5
Creating force fields...
Setting up the simulation with random seed 482445
Setting up moves...
Created a MD move that uses OpenMM for all molecules on 0
Generated random seed number 482445
Saving restart
Setting up sim file.
There are 14344 atoms in the group
###===========================================================###

###=======================Minimisation========================###
Running minimisation.
Energy before the minimisation: 1412.65 kcal mol-1
Tolerance for minimisation: 1
Maximum number of minimisation iterations: 1000
Energy after the minimization: 1420.13 kcal mol-1
Energy minimization done.
###===========================================================###

###====================somd-freenrg run=======================###
Starting somd-freenrg run...
500 moves 10 cycles, 10 ps simulation time

Cycle =  1

Backing up previous restart
Saving new restart

Cycle =  2

Backing up previous restart
Saving new restart

Cycle =  3

Backing up previous restart
Saving new restart

Cycle =  4

Backing up previous restart
Saving new restart

Cycle =  5

Backing up previous restart
Saving new restart

Cycle =  6

Backing up previous restart
Saving new restart

Cycle =  7

Backing up previous restart
Saving new restart

Cycle =  8

Backing up previous restart
Saving new restart

Cycle =  9

Backing up previous restart
Saving new restart

Cycle =  10

Backing up previous restart
Saving new restart
Simulation took 652 s
###===========================================================###

Clearing buffers...

Note that I am running on the CPU rather than GPU (due to the stupid hardcoded CUDA paths in the OpenMM version that I'm using) but the CPU failed in the same way for your original input system.

This points to an issue with the SOMD minimisation protocol. The OpenMM minimisation protocol in BioSimSpace is completely vanilla, so should certainly be reproducible with SOMD.

lohedges commented 3 years ago

I wondered whether it was an issue with the initial configuration being some in some kind of janky configuration that happened to be a local minima. This might mean that the SOMD minimisation doesn't actually do anything. BioSimSpace uses a Langevin integrator with OpenMM, so the stochastic component would help to escape any local minima. In contrast it looks like SOMD uses a leapfrog Verlet integrator by default. However, when I tried setting the integrator type option in the SOMD config file to Langevin I still see the same thing, so it can't be that simple.

jmichel80 commented 3 years ago

Will try to look into it. The energy minimiser in OpenMM resets coordinates to the starting ones if it consistently fails to lower the energy. I have seen this happen sometimes when we minimise with constraints on. SOMD disables constraints temporarily to enable an un constrained energy minimisation. This is probably the most significant difference with a vanilla call to the OpenMM energy minimiser. The MD integrator shouldn’t matter as no MD is done at this stage.

Best wishes,

Julien

Sent from my iPhone

On 1 Apr 2021, at 19:23, Lester Hedges @.***> wrote:

 I wondered whether it was an issue with the initial configuration being some in some kind of janky configuration that happened to be a local minima. This might mean that the SOMD minimisation doesn't actually do anything. BioSimSpace uses a Langevin integrator with OpenMM, so the stochastic component would help to escape any local minima. In contrast it looks like SOMD uses a leapfrog Verlet integrator by default. However, when I tried setting the integrator type option in the SOMD config file to Langevin I still see the same thing, so it can't be that simple.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

lohedges commented 3 years ago

I tested a range of integrators when setting up the protocols for BioSimSpace and you actually do get quite different results when performing a minimisation. It's probably not the root cause of the issue here, but something to be aware of nonetheless.

On Thu, 1 Apr 2021, 20:00 Julien Michel, @.***> wrote:

Will try to look into it. The energy minimiser in OpenMM resets coordinates to the starting ones if it consistently fails to lower the energy. I have seen this happen sometimes when we minimise with constraints on. SOMD disables constraints temporarily to enable an un constrained energy minimisation. This is probably the most significant difference with a vanilla call to the OpenMM energy minimiser. The MD integrator shouldn’t matter as no MD is done at this stage.

Best wishes,

Julien

Sent from my iPhone

On 1 Apr 2021, at 19:23, Lester Hedges @.***> wrote:

 I wondered whether it was an issue with the initial configuration being some in some kind of janky configuration that happened to be a local minima. This might mean that the SOMD minimisation doesn't actually do anything. BioSimSpace uses a Langevin integrator with OpenMM, so the stochastic component would help to escape any local minima. In contrast it looks like SOMD uses a leapfrog Verlet integrator by default. However, when I tried setting the integrator type option in the SOMD config file to Langevin I still see the same thing, so it can't be that simple.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/michellab/Sire/issues/345#issuecomment-812108479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3L5FPB2BKP5F5HEV33TGS7FRANCNFSM42HJQ2RA .

jmichel80 commented 3 years ago

The issue is likely because the input file is a Triclinic box. The first sign there is something odd is that there are no obvious steric clashes yet the single point energy is unsually high for an equilibrated solvated complex (should be negative many thousands of kcal/mol due to the coulomb-coulomb term).

(Pdb) system.energy()
2652.18 kcal mol-1
(Pdb) system.energies()
(..)
  E_{solvent:solvent}^{coulomb} == 0, E_{solvent_intraclj}^{CLJ} == -16.4275, E_{solvent_intraclj}^{LJ} == 227.896, E_{solvent_intraclj}^{coulomb} == -244.324
(...)
(Pdb) cart = Cartesian()
(Pdb) system.setProperty("space",cart)
(Pdb) system.energy()
-34839.6 kcal mol-1
(Pdb) system.energies()
(...)
  E_{solvent:solvent}^{coulomb} == -42168.6, E_{solvent_intraclj}^{CLJ} == -703.874, E_{solvent_intraclj}^{LJ} == -453.265, E_{solvent_intraclj}^{coulomb} == -250.609
(...)

Digging deeper in the energies with the triclinic box settings we have E_{solute_hard:solvent}^{CLJ} == 0 whereas disable periodicity gives E_{solute_hard:solvent}^{CLJ} == -181.757

This indicates that there intermolecular distances are not computed correctly by Sire.

Indeed there are no issues running the OpenMM energy minimiser without periodicity


(Pdb) system.energy()
-34839.6 kcal mol-1
(Pdb) integrator.setConstraintType("none")
(Pdb)  system = integrator.minimiseEnergy(system, minimise_tol.val, minimise_max_iter.val)
(Pdb) system.mustNowRecalculateFromScratch()
(Pdb) system.energy()
-48884.2 kcal mol-1

Tests run with Sire version: 2020.1.0

@lohedges this suggests a bug in the Triclinic box implementation from last Autumn, @mark-cresset if you could repeat tests with a rectangular box setup to confirm that would be great.

lohedges commented 3 years ago

Thanks @jmichel80, that's really helpful. I'll hopefully be able to take a closer look at some point next week, but I imagine that this issue stems from the fact that we don't yet fully support the TriclinicBox space for all of Sire's internal energy calculation routines. This is because some are performed on a grid, which is currently assumed to be orthorhombic. These will need to be adapted to work with triclinic systems.

The inital remit for the TrinclinicBox was to enable read/write support in BioSimSpace so that we could run simulations with the various MD engines. Since SOMD needs to calculate system energies too, then we'll need to look at updating the forcefield code as soon as possible. An initial workaround could be to fallback to calculations in a Cartesian space if the system has a TriclinicBox. If you point me to the specific parts of the code where energies are being computed, I'd be happy to implement and test this.

There are reasonably extensive tests for the TriclinicBox class here, including distances, bonded potential terms, and energies. The code is a complete drop-in replacement for PeriodicBox in the case of cartesian spaces. (Note that the energy comparisons only use InterFF and InterCLJFF.)

jmichel80 commented 3 years ago

hi @lohedges since OpenMM appears to handle that input I'm hopeful it's a simple problem with the way the TriclinicBox is setup during the OpenMM calculation in SOMD. To debug that I would compare first the single point energies between Sire (system.energy) and OpenMM (from SOMD via integrator.getPotentialEnergy() ) and check if they are identical. If that's the case I would compare more closely the way the box is initialise at e.g. https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp#L3027 with what vanilla OpenMM does.

Finally...the Triclinic box dimensions in the input posted (generated with BSS it seems) also have unusual angles. Not sure what sort of space group this corresponds.

56.9015961 56.9015961 56.9015923 56.2510109 109.4712219 109.4712219

lohedges commented 3 years ago

Yes, I'm not sure what those angles correspond to. They aren't generated by one of the default box options within BioSimSpace.Box, so I'm guessing they were custom values. Interestingly, taking the output of the OpenMM minimisaton and writing to file gives the following:

56.9015961  56.9015961  56.9015915 123.7489907  70.5287778 109.4712219

(These are extracted from the OpenMM DCD trajectory file and injected back into the BioSimSpace system when calling process.getSystem() following the minimisation.)

This is what you would expect for a truncated octahedron, and agrees with what BioSimSpace would generate in that case. (The angle ordering is different, but that is a convention that changes between MD engines, subject to some constraints.)

import BioSimSpace as BSS
box, angles = BSS.Box.truncatedOctahedron(SS.Units.Length.angstrom)

print(box)
[1.0000 A, 1.0000 A, 1.0000 A]

print(angles)
[123.7490 degrees, 109.4712 degrees, 70.5288 degrees]

@mark-cresset: If possible, could you share how the solvated system was generated. It would be great if you were able to post the line used in BioSimSpace, as well as uploading an archive of the solvation working directory, e.g. from running something like:

solvated = BSS.Solvent.tip3p(molecule=molecule, box=box, angles=angles, work_dir="work_dir")

Also, could you let me know what versions of Sire/BioSimSpace you are using so that I can try to reproduce things exactly.

One other issue that I forgot to mention is that I once had a simulation blow up owing to loss of precision in the conversion of the lattice vectors from GROMACS (what we use for solvation) to AMBER format. Obviously there is no issue in the case of a periodic system, since the angles are integer valued, but converting from the full lattice vector matrix used by GROMACS to the base-length and angles used by AMBER can lead to imprecision in distance calculations. This happened once to me and I simply regenerated another solvated system which worked fine.

I'll post an update on the energy comparison when I get a chance. Here's output for a single-step minimisation in OpenMM, using a Verlet intergrator with sufficiently small time step to make the results reproducible.

#"Step" "Time (ps)" "Potential Energy (kJ/mole)" "Kinetic Energy (kJ/mole)" "Total Energy (kJ/mole)" "Temperature (K)" "Box Volume (nm^3)"
1 2e-06 -228998.01228850277 1.1725259706363834e-06 -228998.01228733023 9.50989903017781e-09 141.82454668558486
lohedges commented 3 years ago

I just looked at the Input.pdb file. It seems that this is where the angles are coming from:

CRYST1   56.902   56.902   56.902  56.25 109.47 109.47 P 1           1
mark-mackey-cresset commented 3 years ago

Hi Lester and Julien,

The odd angles are being created by the prepare-FEP workflow. If you look at the attached data, we run

sire_python scripts/prepareFEP.py --input1 Input1.parm7 Input1.rst7 --input2 Input2.parm7 Input2.rst7 --mapping mapping.txt --output Output

Input1.rst7 and Input2.rst7 have box coordinates:

==> Input1.rst7 <== 35.9425735 35.9425735 35.9425735 109.4712219 109.4712219 109.4712219

==> Input2.rst7 <== 35.8750648 35.8750648 35.8750648 109.4712219 109.4712219 109.4712219

And Output.rst7 says:

35.9425735 35.9425735 35.9425720 56.2510122 109.4712227 109.4712219

I think these are actually equivalent as if I load Output. into VMD it looks identical to Input1. : the boxes are identical and the system fills space in exactly the same way, but the order of the tiling is different.

We’ve seen this in other workflows: I think cpptraj also converts a “109.4712219 109.4712219 109.4712219” triclinic system into a “56.2510122 109.4712227 109.4712219” one.

Regards, Mark

-- Mark Mackey Chief Scientific Officer Cresset New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 853667 email: @.**@.> web: www.cresset-group.comhttp://www.cresset-group.com/ skype: mark_cresset

From: Lester Hedges @.> Sent: 02 April 2021 15:21 To: michellab/Sire @.> Cc: Mark Mackey @.>; Mention @.> Subject: Re: [michellab/Sire] SOMD failing to minimise and giving NaN (#345)

I just looked at the Input.pdb file. It seems that this is where the lattice vectors are coming from:

CRYST1 56.902 56.902 56.902 56.25 109.47 109.47

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/michellab/Sire/issues/345#issuecomment-812551534, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AECY3YSRPKJ24A6OVSWPQ6LTGXHFXANCNFSM42HJQ2RA.

This email has been sent from Cresset BioMolecular Discovery Limited, registered in England and Wales, Company Number: 04151475. The information in this email and any attachments are confidential and may be privileged. It is intended solely for the addressee and access to this email by anyone else is unauthorised. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you must not use, disclose, distribute, store or copy the information in any medium. Although this e-mail and any attachments are believed to be free from any virus or other defect which might affect any system into which they are opened or received, it is the responsibility of the recipient to check that they are virus-free and that they will in no way affect systems and data. No responsibility is accepted by Cresset BioMolecular Discovery Limited for any loss or damage arising in any way from their receipt, opening or use. Privacy noticehttps://www.cresset-group.com/privacy/

mark-mackey-cresset commented 3 years ago

Oh, forgot to mention, the original system is just being created using tLeap with “solvateoct mol TIP3PBOX 6.000”.

Regards, Mark

From: Mark Mackey Sent: 02 April 2021 23:29 To: michellab/Sire @.>; michellab/Sire @.> Cc: Mention @.***> Subject: RE: [michellab/Sire] SOMD failing to minimise and giving NaN (#345)

Hi Lester and Julien,

The odd angles are being created by the prepare-FEP workflow. If you look at the attached data, we run

sire_python scripts/prepareFEP.py --input1 Input1.parm7 Input1.rst7 --input2 Input2.parm7 Input2.rst7 --mapping mapping.txt --output Output

Input1.rst7 and Input2.rst7 have box coordinates:

==> Input1.rst7 <== 35.9425735 35.9425735 35.9425735 109.4712219 109.4712219 109.4712219

==> Input2.rst7 <== 35.8750648 35.8750648 35.8750648 109.4712219 109.4712219 109.4712219

And Output.rst7 says:

35.9425735 35.9425735 35.9425720 56.2510122 109.4712227 109.4712219

I think these are actually equivalent as if I load Output. into VMD it looks identical to Input1. : the boxes are identical and the system fills space in exactly the same way, but the order of the tiling is different.

We’ve seen this in other workflows: I think cpptraj also converts a “109.4712219 109.4712219 109.4712219” triclinic system into a “56.2510122 109.4712227 109.4712219” one.

Regards, Mark

-- Mark Mackey Chief Scientific Officer Cresset New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 853667 email: @.**@.> web: www.cresset-group.comhttp://www.cresset-group.com/ skype: mark_cresset

From: Lester Hedges @.**@.>> Sent: 02 April 2021 15:21 To: michellab/Sire @.**@.>> Cc: Mark Mackey @.**@.>>; Mention @.**@.>> Subject: Re: [michellab/Sire] SOMD failing to minimise and giving NaN (#345)

I just looked at the Input.pdb file. It seems that this is where the lattice vectors are coming from:

CRYST1 56.902 56.902 56.902 56.25 109.47 109.47

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/michellab/Sire/issues/345#issuecomment-812551534, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AECY3YSRPKJ24A6OVSWPQ6LTGXHFXANCNFSM42HJQ2RA.

This email has been sent from Cresset BioMolecular Discovery Limited, registered in England and Wales, Company Number: 04151475. The information in this email and any attachments are confidential and may be privileged. It is intended solely for the addressee and access to this email by anyone else is unauthorised. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you must not use, disclose, distribute, store or copy the information in any medium. Although this e-mail and any attachments are believed to be free from any virus or other defect which might affect any system into which they are opened or received, it is the responsibility of the recipient to check that they are virus-free and that they will in no way affect systems and data. No responsibility is accepted by Cresset BioMolecular Discovery Limited for any loss or damage arising in any way from their receipt, opening or use. Privacy noticehttps://www.cresset-group.com/privacy/

lohedges commented 3 years ago

Thanks. We perform some transformations to the triclinic box vectors and angles on read so that they satisfy the constraints set by the various MD engines. However, we don't (yet) transform the molecular coordinates after this. I found that the MD engines have their own routines for sorting this on read. This is might explain the different behaviour when letting OpenMM load a system from file and sorting stuff, rather than setting things explicitly through the API.

If you could share the original input files that were passed to prepareFEP I can try transforming the coordinates myself.

I hadn't added this functionality since I didn't actually come across any inputs that didn't already satisfy the constraints (other than ones I made myself).

On Fri, 2 Apr 2021, 23:32 mark-cresset, @.***> wrote:

Oh, forgot to mention, the original system is just being created using tLeap with “solvateoct mol TIP3PBOX 6.000”.

Regards, Mark

From: Mark Mackey Sent: 02 April 2021 23:29 To: michellab/Sire @.>; michellab/Sire @.> Cc: Mention @.***> Subject: RE: [michellab/Sire] SOMD failing to minimise and giving NaN (#345)

Hi Lester and Julien,

The odd angles are being created by the prepare-FEP workflow. If you look at the attached data, we run

sire_python scripts/prepareFEP.py --input1 Input1.parm7 Input1.rst7 --input2 Input2.parm7 Input2.rst7 --mapping mapping.txt --output Output

Input1.rst7 and Input2.rst7 have box coordinates:

==> Input1.rst7 <== 35.9425735 35.9425735 35.9425735 109.4712219 109.4712219 109.4712219

==> Input2.rst7 <== 35.8750648 35.8750648 35.8750648 109.4712219 109.4712219 109.4712219

And Output.rst7 says:

35.9425735 35.9425735 35.9425720 56.2510122 109.4712227 109.4712219

I think these are actually equivalent as if I load Output. into VMD it looks identical to Input1. : the boxes are identical and the system fills space in exactly the same way, but the order of the tiling is different.

We’ve seen this in other workflows: I think cpptraj also converts a “109.4712219 109.4712219 109.4712219” triclinic system into a “56.2510122 109.4712227 109.4712219” one.

Regards, Mark

-- Mark Mackey Chief Scientific Officer Cresset New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire, SG8 0SS, UK tel: +44 (0)1223 858890 mobile: +44 (0)7595 099165 fax: +44 (0)1223 853667 email: @.**@.> web: www.cresset-group.com< http://www.cresset-group.com/> skype: mark_cresset

From: Lester Hedges @.**@.>> Sent: 02 April 2021 15:21 To: michellab/Sire @.**@.>> Cc: Mark Mackey @.**@.>>; Mention @.**@.>> Subject: Re: [michellab/Sire] SOMD failing to minimise and giving NaN (#345)

I just looked at the Input.pdb file. It seems that this is where the lattice vectors are coming from:

CRYST1 56.902 56.902 56.902 56.25 109.47 109.47

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/michellab/Sire/issues/345#issuecomment-812551534>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AECY3YSRPKJ24A6OVSWPQ6LTGXHFXANCNFSM42HJQ2RA>.

This email has been sent from Cresset BioMolecular Discovery Limited, registered in England and Wales, Company Number: 04151475. The information in this email and any attachments are confidential and may be privileged. It is intended solely for the addressee and access to this email by anyone else is unauthorised. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you must not use, disclose, distribute, store or copy the information in any medium. Although this e-mail and any attachments are believed to be free from any virus or other defect which might affect any system into which they are opened or received, it is the responsibility of the recipient to check that they are virus-free and that they will in no way affect systems and data. No responsibility is accepted by Cresset BioMolecular Discovery Limited for any loss or damage arising in any way from their receipt, opening or use. Privacy noticehttps://www.cresset-group.com/privacy/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/michellab/Sire/issues/345#issuecomment-812741278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3PQIYVBZYHIO2BWOQLTGZAXTANCNFSM42HJQ2RA .

mark-mackey-cresset commented 3 years ago

Hi Lester,

The input files were attached to my previous email (if they were stripped off by an overzealous antivirus filter or something let me know and I’ll stick them up somewhere with a download link.

One thing to note is that the system that SOMD crashed on is part of a larger FEP network. All of the other molecules/links seem to be OK (and have identical PBC setups etc) , so I don’t think there’s anything inherently wrong with the toolchain in general or the support for triclinic systems. I would blame it on broken input data for this one case, except that I can’t see anything obviously broken about it. We seem to get a similar-looking failure for about 1 molecule in 50-100. In most (but not all) cases re-equilibrating seems to fix it, but it would be nice to know what’s going wrong so we can avoid that 2% failure rate!

Regards, Mark

lohedges commented 3 years ago

Thanks, Mark. I have the zip file containing the inputs to SOMD, but not the files that we used as inputs to prepareFEP. I don't think email replies attach files to the issue thread, so I imagine that you'll need to reply directly on GitHub.

Thanks for the extra info. It's good to know that this is a rare occurrence and that triclinic cells appear to be handled correctly in general. If re-equilibration fixes things then I imagine it might be a precision loss issue with our read-write of the triclinic cell data, particulary if there was a box transformation somewhere. I'll check for differences between the input and output data of prepareFEP to see if I can figure out what's gone wrong.

Cheers.

mark-mackey-cresset commented 3 years ago

prepareFEP.tar.gz

Here's the file: note that it's a different system, I'm afraid, but demonstrates the change in box coordinates. I don't think that the inputs to the files in the original bug report were saved, but I'll have a dig around and see if I can reproduce.

lohedges commented 3 years ago

I've tested the input and the triclinic box is being transformed by the application of a lattice reduction to satisfy the constraints of the various MD engines. The reduction that I have applied (following a guide for GROMACS) appears lose the sign of one of the box vector components in this case. (Specifically the y component of the third box vector.) I have reapplied the reduction following the equations in this thesis and the input vectors are now preserved. (Ironically, the code that I followed online was by the same author.) I'll push a commit later, following which could you test to see if it resolves your problem?

I'm also working on figuring out the best way to rotate the molecular coordinates in the event that the triclinic box is rotated. (It isn't rotated for this input, or any input that I've generated using AmberTools or GROMACS). I just need to figure out where best to apply this change, and the point about which to apply the rotation. (In case the box origin isn't always at (0, 0, 0).) I'll probably push this update at a later date.

lohedges commented 3 years ago

Just an update to say that I've retested with your original input and I can get it to work if I use the original box angles from your prepareFEP input files, i.e. where the Sire lattice reduction that flips one of the vector components hasn't been performed. However, looking at the angles from your input files I see:

  56.9015961  56.9015961  56.9015961 109.4712219 109.4712219 109.4712219

(Note that I also made all of the box lengths identical, since previously the third component was slightly smaller, i.e. 56.9015923. In contrast, your prepareFEP examples have the same magnitude for each vector.)

109.4712219 seems slightly off in comparison to other octahedral boxes, where acos(−1/3) == 109.4712206 is commonly used. If I use the original value (presumably generated by tLEaP's solvateoct command) then OpenMM fails with the following error:

RuntimeError: Periodic box vectors must be in reduced form.

After replacing the angles with:

  56.9015961  56.9015961  56.9015961 109.4712206 109.4712206 109.4712206

the simulation runs with no issues. (That said, I haven't checked that the output is sane.)

The OpenMM check is as follows (from their Python implementation):

   def setPeriodicBoxVectors(self, vectors):
        """Set the vectors defining the periodic box."""
        if vectors is not None:
            if not is_quantity(vectors[0][0]):
                vectors = vectors*nanometers
            if vectors[0][1] != 0*nanometers or vectors[0][2] != 0*nanometers:
                raise ValueError("First periodic box vector must be parallel to x.");
            if vectors[1][2] != 0*nanometers:
                raise ValueError("Second periodic box vector must be in the x-y plane.");
            if vectors[0][0] <= 0*nanometers or vectors[1][1] <= 0*nanometers or vectors[2][2] <= 0*nanometers or vectors[0][0] < 2*abs(vectors[1][0]) or vectors[0][0] < 2*abs(vectors[2][0]) or vectors[1][1] < 2*abs(vectors[2][1]):
                raise ValueError("Periodic box vectors must be in reduced form.");
        self._periodicBoxVectors = deepcopy(vectors)

It is the final conditional that is failing, i.e. vectors[1][1] < 2*abs(vectors[2][1]), where vectors[1][1] == 53.64733886459637 and 2*abs(vectors[2][1]) == 53.64734389180794

I can also get the simulation to run if I replace the box with the output of the OpenMM minimisation described earlier in this thread, i.e.:

56.9015961  56.9015961  56.9015915 123.7489907  70.5287778 109.4712219

Bizarrely, if I perform an OpenMM minimisation using the the input with a box of:

  56.9015961  56.9015961  56.9015961 109.4712206 109.4712206 109.4712206

Then from the minimised DCD file I get a box of:

  56.9015961  56.9015961  56.9015961  70.5287781  70.5287781 109.4712219

Copying this back into the original input and trying to run the SOMD simulation again raises:

RuntimeError: Periodic box vectors must be in reduced form.

This seems strange, since this was the box that OpenMM generated internally to run the minimisation! If I check the box vectors I find the same part of the conditional failing once again, i.e. i.e. vectors[1][1] < 2*abs(vectors[2][1]), where this time vectors[1][1] == 53.64733886459637 and 2*abs(vectors[2][1]) == 53.64734389180792, which is basically the same as the problem that I had above. (Note that OpenMM also reports one angle of 109.4712219.)

I wonder if all of these issues are simply the result of rounding errors during conversion of the triclinic box vectors to/from the various formats involved in running a simulation, e.g. for the minimisation OpenMM is reading parm/rst files but writing to DCD. The fact that OpenMM can doesn't raise the runtime error when loading the information from the file does point to a difference in the way that the triclinic box information is handled when going from file versus using setPeriodicBoxVectors through the C++ API, however.

I'm not quite sure where to go from here? Are the tLEaP and OpenMM angles correct? If so, how do we account for this? I guess we could check for angles close to acos(-1/3) and convert.

jmichel80 commented 3 years ago

Waoo well done for tracking this @lohedges might be worth to ping @peastman on OpenMM specifics before adding custom sanity checks to BioSimSpace or Sire.

lohedges commented 3 years ago

Yes, I agree. I note that the DCD file written by OpenMM is read into BioSimSpace using MDTraj or MDAnalysis. There are layers of conversions going on here so it's not surprising that something might have gone wrong, or that precision is lost, or formatting is inconsistent.

I'll try to take a look at what they do internally. If I recall, one of the issues is that there isn't even a standard for the order of box information in a DCD file to begin with.

On Wed, 7 Apr 2021, 17:17 Julien Michel, @.***> wrote:

Waoo well done for tracking this @lohedges https://github.com/lohedges might be worth to ping @peastman https://github.com/peastman on OpenMM specifics before adding custom sanity checks to BioSimSpace or Sire.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/michellab/Sire/issues/345#issuecomment-815043476, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3JYNATJHPS5INJU6LTTHSAQHANCNFSM42HJQ2RA .

peastman commented 3 years ago

I don't have a lot of insight, I'm afraid. It does sound like a precision issue, probably in the conversions back and forth between vectors and angles? OpenMM has a function to convert any set of box vectors into reduced form:

https://github.com/openmm/openmm/blob/0607d41c18ee1b52efe872b7b951e097f670ad4e/wrappers/python/openmm/app/internal/unitcell.py#L83-L99

Does that fix the problem?

lohedges commented 3 years ago

Thanks. I'll compare your lattice reduction to our own implementation tomorrow. At a first glance it looks to be the same.

On Wed, 7 Apr 2021, 18:53 Peter Eastman, @.***> wrote:

I don't have a lot of insight, I'm afraid. It does sound like a precision issue, probably in the conversions back and forth between vectors and angles? OpenMM has a function to convert any set of box vectors into reduced form:

https://github.com/openmm/openmm/blob/0607d41c18ee1b52efe872b7b951e097f670ad4e/wrappers/python/openmm/app/internal/unitcell.py#L83-L99

Does that fix the problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/michellab/Sire/issues/345#issuecomment-815107546, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3KT4O5K5K7R433P53TTHSL2HANCNFSM42HJQ2RA .

lohedges commented 3 years ago

Okay, I think I might have fixed this. It turns out that our original lattice reduction, which was based on code adapted by Tserk Wassenar from his thesis and posted here, does a slightly different reduction and seems to introduce some rounding errors. (I came across this before the thesis.) We now reproduce the equations from the thesis directly, which is what OpenMM does, and (unsurprisingly) our reduction now agrees with OpenMM.

Taking the inputs to prepareFEP (which is the same as the original system, except a different box size) I see a box of:

  35.9425735  35.9425735  35.9425735 109.4712219 109.4712219 109.4712219

If I now read this file and write back to rst7 format I get the following:

  35.9425735  35.9425735  35.9425713  70.5287806  70.5287806 109.4712219

This agrees with the output from OpenMM. (Note that the third box vector component should actually be slightly smaller than the first two.)

If I keep the box dimensions from your original post and simply replace the angles with the above, i.e.:

  56.9015961  56.9015961  56.9015923  70.5287806  70.5287806 109.4712219

then the simulation runs without issue.

Once I've pushed the changes (just need to update the Python wrappers locally) could you check to see if it fixes issues with any other problematic inputs that you've come across? My feeling is that this is simply a rounding issue caused by the slightly different reduction scheme, rather than anything more suspect with our triclinic box implementation.

mark-mackey-cresset commented 3 years ago

Brilliant, thanks Lester. We’ll patch in your changes here; we ought to find out reasonably soon whether it’s fixing the problem more generally (hard to tell up-front as it seems to be a bit random whether or not we have a problem with any given run).

I have to confess I wouldn’t have guessed that a 1 part in 10^8 rounding error would cause this!

Thanks, Mark

lohedges commented 3 years ago

Hi @mark-cresset, just wondering if there was any update on whether the change in lattice reduction fixed your problems. We're giving a workshop in a couple of weeks and it would be good to be able to advise the attendees whether the triclinic-box code is okay. (I think they'll be using cubic boxes for the workshop, but the intention is to let them loose on their own systems afterwards.)

Cheers.

mark-mackey-cresset commented 3 years ago

We've only just started doing intensive testing, but so far so good. I'll keep you posted if we see any further problems. It's a bit hard to be definite as this only cropped up occasionally...

lohedges commented 3 years ago

c.f: https://www.cresset-group.com/about/news/flare-fep-speed ;-)