michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

GROMACS to AMBER conversion with Ryckaert-Bellemans dihedral potentials #120

Closed SofiaBariami closed 3 years ago

SofiaBariami commented 5 years ago

I am working with a self assebmled system that consists of four ligands bridged together with two Pd(II) ions to form a nano-capsule. Both the ligands and the metals are parameterised with the ffld server of Schrodinger MacroModel and the parameters are incorporated in the OPLS-AA topology of GROMACS. Now, I need to run an absolute free energy calculation with Sire using this system and an encapsulated small molecule, so I needed to convert the GROMACS(.gro, .top) to the respective Amber ones (.prm7, .rst7). My first attempt was with BioSimSpace. While trying to convert two GROMACS files (.gro, .top) to the respective Amber ones (.prm7, .rst7), using this code:

import BioSimSpace as BSS

files = ["file.gro","file.top"]
system = BSS.IO.readMolecules(files) 
BSS.IO.saveMolecules('output', system, fileformat=['PRM7', 'RST7'])

I stumbled upon this error from _io.py: Failed to read molecules from: ['file.gro', 'file.top'] This very generic error message wasn't helpful at all, so I tried ParmEd, since the code provided at their website allows you to use a GROMACS topology folder of your choice:

from parmed.gromacs import GROMACS_TOPDIR
GROMACS_TOPDIR = "/home/sofia/Desktop/Sofia/cages/gromacs_to_amber/top/"

However, with the script that I used this time I got another error:

import parmed as pd
from parmed import gromacs, amber, unit as u
gmx_top = gromacs.GromacsTopologyFile('cage_quin1.top')
gmx_gro = gromacs.GromacsGroFile.parse('cage_quin1.gro')
gmx_top.box = gmx_gro.box # Needed because .prmtop contains box info
gmx_top.positions = gmx_gro.positions
amb_prm = amber.ChamberParm.from_structure(gmx_top)
amb_prm.write_parm("prmtop")
amb_inpcrd = amber.AmberAsciiRestart("inpcrd", mode="w")
amb_inpcrd.coordinates = gmx_top.coordinates
amb_inpcrd.box = gmx_top.box
amb_inpcrd.close()

TypeError: ChamberParm does not support all potential terms defined in the input Structure

I figured out that the problem was at the dihedral potential terms. OPLS-AA in GROMACS uses the Ryckaert-Bellemans function to define the dihedral potentials (denoted by function number 3 at the top file). These dihedrals' functional form is more complicated, using six coefficients to define the potential (http://manual.gromacs.org/documentation/2019/reference-manual/functions/bonded-interactions.html), causing Amber, that uses the analytical function to treat proper dihedrals, to fail. Back in 2016, that was still an issue that hadn't been fixed yet (https://github.com/ParmEd/ParmEd/issues/714).

The issue was resolved by using a less sophisticated script of ParmEd that loads the grotop files and save the prm, rst files right away.

gmx_top = pmd.load_file('file.top',xyz='file.gro')
gmx_top.save('file.prm7', format='amber')
gmx_top.save('file.rst7', format='rst7')
jmichel80 commented 5 years ago

@SofiaBariami can you upload the input files file.top and file.gro as well

How do you know that the parameters have been converted correctly from the gro topology to the amber prm7 file by ParmEd if the Ryckaert-Bellemans functional form is not supported ?

lohedges commented 5 years ago

Hi @SofiaBariami. You can also use a custom GROMACS topology directory with Sire/BIoSimSpace, you just need to add it to the property map that you pass to the IO reader, e.g.

system = BSS.IO.readMolecules(files, property_map={"GROMACS_PATH" : "path/to/gromacs/toplologies"})

If you want to get more detailed error messages you can try parsing a file directly using the underlying Sire IO parser. In this case, e.g.:

from Sire.IO import GroTop

grotop = GroTop("file.top", property_map={"GROMACS_PATH" : "path/to/gromacs/toplologies"})

In general, I gobble the long form error message since it isn't always helpful to the user, and they aren't going to delve into the Sire source code when using BioSimSpace. Ideally I'd like to categorise the messages in some way and provide a more meaningful user readable output. (Currently similar errors give very different messages with the various Sire parsers.)

Note that Sire expects potential terms, e.g. dihedrals, to take specific forms for AMBER and GROMACS so the parser might fail for specific force fields. (I can't remember exactly what the GroTop parser currently supports.) This also means that you can only convert between AMBER and GROMACS format for AMBER-style force fields.

I'll update the documentation with details regarding custom topology paths.

SofiaBariami commented 5 years ago

@jmichel80 I uploaded the files here. I am also including the top folder. To check the parameters I am planning to calculate the single point energies of the system both with GROMACS and Sire. Also , I could run a long MD simulation and calculate Pd-O distances in order to compare them with the results that I already have with the OPLS-AA parameters. gromacs_files.zip

@lohedges Thanks a lot! That is very useful. I'll try again with BSS to see if it works this time.

lohedges commented 5 years ago

This works fine using the files in the zip archive that you uploaded:

import BioSimSpace as BSS
system = BSS.IO.readMolecules(BSS.IO.glob("cage_quin1.*"), property_map={"GROMACS_PATH" : "./top"}) 
BSS.IO.saveMolecules("test", system, ["rst7", "prm7"])  
SofiaBariami commented 5 years ago

Yes, I also did it and also the files are the same with the ones generated with ParmEd.

lohedges commented 5 years ago

Great, glad it worked, and that you reproduce the same output as ParmEd. Sorry the documentation wasn't clearer, I'll aim to update that later today. Feel free to close the issue when you're satisfied with the solution.

jmichel80 commented 5 years ago

@SofiaBariami can you close this issue once you have shared an update on the whether the conversion gave valid output.

lohedges commented 5 years ago

Ah, good suggestion. It would be nice to know if we can correctly handle parameter conversion for this force field.

@jmichel80 I must have been typing my original reply when you posted your first message. Didn't spot it until the issue was reopened.

The documentation for BioSimSpace.IO.readMolecules has now been updated. See here.

SofiaBariami commented 5 years ago

Hi everyone. Sorry for taking me a week to reply, but I was trying different systems to see how the conversion works. I'll show here two examples. A) Self assembled capsule with four 3-pyridine cage ligands and 2 Pd(II) ions, encapsulated quinon, no solvent GROMACS Energies: (kJ/mol)

 Angle           Proper Dih.        Ryckaert-Bell.           LJ-14         Coulomb-14      LJ (SR)      Disper. corr.    Coulomb (SR)     Coul. recip.      Potential
5.63560e+01      0.00000e+00         -3.96144e+01        3.81757e+02     -1.83172e+03    3.91074e+31   -3.75851e+02     -1.00580e+03      2.76994e+03       3.91074e+31

Sire Energies (kcal/mol):

{ E_{ions:ions}^{CLJ} == 0, E_{ions:ions}^{LJ} == 0, E_{ions:ions}^{coulomb} == 0, 
E_{ions:molecules}^{CLJ} == 0,  E_{ions:molecules}^{LJ} == 0, 
E_{ions:molecules}^{coulomb} == 0, E_{molecules-intrabonded}^{1-4[LJ]} == 0, 
E_{molecules-intrabonded}^{1-4[coulomb]} == 0, E_{molecules-intrabonded}^{1-4} == 0, 
E_{molecules-intrabonded}^{angle} == 13.4695, E_{molecules-intrabonded}^{bend-bend} == 0, 
E_{molecules-intrabonded}^{bond} == 12.9509, E_{molecules-intrabonded}^{dihedral} == 1.02475e-14,
E_{molecules-intrabonded}^{improper} == 0, E_{molecules-intrabonded}^{internal} == 26.4204, 
E_{molecules-intrabonded}^{stretch-bend-torsion} == 0, E_{molecules-intrabonded}^{stretch-bend} == 0,
E_{molecules-intrabonded}^{stretch-stretch} == 0, E_{molecules-intrabonded}^{urey_bradley} == 0,
E_{molecules-intranonbonded}^{CLJ} == 5.77041e+32, E_{molecules-intranonbonded}^{LJ} == 5.77041e+32,
E_{molecules-intranonbonded}^{coulomb} == 411.831, E_{molecules:molecules}^{CLJ} == 0, 
E_{molecules:molecules}^{LJ} == 0, E_{molecules:molecules}^{coulomb} == 0, 
E_{restraint}^{restraint} == 0, E_{total} == 5.77041e+32 }

The energies are not identical, but they are of the similar order. What is disturbing is the huge total potential energy, but this is an issue of the system itself. To reduce the energy, I solvated it with chloroform and neutralise it with four BarF- ions, that leads us to the second system:

B) Self assembled capsule with four 3-pyridine cage ligands and 2 Pd(II) ions, encapsulated quinon, four BarF- ions, solvated in chloroform

GROMACS Energies: (kJ/mol)

 Angle           Proper Dih.   Ryckaert-Bell.   LJ-14       Coulomb-14       LJ (SR)      Disper. corr.    Coulomb (SR)    Coul. recip.  Potential
 1.94673e+04    3.32970e+01    1.63211e+02    9.78158e+02   -8.51521e+02     -6.45017e+04   -3.69359e+02   -1.23638e+04    3.64185e+03   -5.38026e+04

Sire Energies (kcal/mol):

{ E_{ions:ions}^{CLJ} == 0, E_{ions:ions}^{LJ} == 0, 
E_{ions:ions}^{coulomb} == 0, E_{ions:molecules}^{CLJ} == 0
E_{ions:molecules}^{LJ} == 0, E_{ions:molecules}^{coulomb} == 0,
E_{molecules-intrabonded}^{1-4[LJ]} == 0, E_{molecules-intrabonded}^{1-4[coulomb]} == 0
E_{molecules-intrabonded}^{1-4} == 0, E_{molecules-intrabonded}^{angle} == 4652.79, 
E_{molecules-intrabonded}^{bend-bend} == 0, E_{molecules-intrabonded}^{bond} == 59.9119
E_{molecules-intrabonded}^{dihedral} == 7.98354, E_{molecules-intrabonded}^{improper} == 0, 
E_{molecules-intrabonded}^{internal} == 4720.69, E_{molecules-intrabonded}^{stretch-bend-torsion} == 0
E_{molecules-intrabonded}^{stretch-bend} == 0, E_{molecules-intrabonded}^{stretch-stretch} == 0, 
E_{molecules-intrabonded}^{urey_bradley} == 0, E_{molecules-intranonbonded}^{CLJ} == -893.82
E_{molecules-intranonbonded}^{LJ} == 245.372, E_{molecules-intranonbonded}^{coulomb} == -1139.19, 
E_{molecules:molecules}^{CLJ} == -17821, E_{molecules:molecules}^{LJ} == -16822
E_{molecules:molecules}^{coulomb} == -999.056, E_{restraint}^{restraint} == 0, E_{total} == -13994.2 }

As we see, if we convert kJ to kcal, the conversion of the total energy for example, is ~1000kcal/mol more than the theoretical value of 12859 that we would expect on the best case scenario.

jmichel80 commented 5 years ago

Hi @SofiaBariami ,

focus first on the vacuum system where there are a number of issues. I can see the angle energy is the same which is encouraging.

First work out how to combine the Sire or Gromacs energy components so they are directly comparable to each other and express the figures in the same units.

There are some things done in one code and not the other that are not necessary to validate the conversion. For instance the dispersion-correction term in gromacs.

There is something really wrong with the input files since the LJ energies are huge. This indicates steric clashes. Fix this before trying to debug further the LJ energies because if you have huge steric clashes, tiny variations in e.g. coordinates or parameters between different file formats due to rounding off errors will cause noticeable changes in potential energy.

For the Coulombic it looks like you are computing PME energies with Gromacs. Not sure what you are using in Sire, I assume reaction-field. Switch to Coulombic no cutoff to make the functional forms identical.

For the dihedral terms it looks like the Ryckaert-Bell potentials are ignored during the BSS conversion. According to page 80-81 of http://manual.gromacs.org/documentation/2016/manual-2016 there is a formula to convert between OPLS style functional forms and Ryckaert-Bell functional forms. Maybe @lohedges can advise on the best place in BioSimSpace to add that functionality when writing a prm7 topology.

If you pay attention to all the above you should really get the same single point energies to within 0.01 kcal/mol

Once you can do that for the vacuum setup worry about the solvated box.

lohedges commented 5 years ago

Hi both,

Ideally the conversion should be done directly within the Sire GroTop and AmberPrm parsers, rather than BioSimSpace. This would be both faster and more flexible. Sire also has some helpful support data structures that can be used to convert the generic internal representation of the potential to specific terms for the different file formats, e.g. AmberDihedral and GromacsDihedral. These classes would need to be updated to deal with conversion between more exotic functional forms. As long as it's obvious how to detect, e.g. Ryckaert-Bell, from the original topology file then I don't see why it wouldn't be possible to handle the conversion.

I have some experience of working with the Amber and Gromacs data structures in Sire, so would be able to help/advise when I'm back next week.

Cheers.

SofiaBariami commented 5 years ago

Hello again, After defining a box, tweaking electrostatic settings in Gromacs and SOMD to make energies comparable, and grouping GROMACS non-bonded terms together, we have an agreement at the energies of the system in vacuum.

GROMACS Energies (kJ/mol)
 Bond     Angle   Proper Dih. Ryckaert-Bell.  LJ-14  Coulomb-14    LJ (SR)   Coulomb (SR)   Coul. recip.    Potential
183.646  92.3101   11.2334       15.5651     330.621  -1808.11     184.126    -1794.49      48.2981        -2736.80 

Sire Energies (kj/mol):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
angles:                    92.32037840000001
bonds:                     183.6746712
dihedral:                  11.277511760000001
internal:                  287.2721848
intranonbonded - CLJ:    - 3559.220016
intranonbonded - LJ:       517.259552
intranonbonded - coulomb:- 4076.479568
total:                   - 3271.9465760000003

SPE

The biggest difference in energies is observed at the component that has to do with the LJ interactions, that is ~4.9 kj/mol = 1.2 kcal/mol, as well as at the energy of the coulomb interactions, with a difference of ~0.6 kcal/mol. I'll keep playing with the mdp options to see if I can get the energy difference within the desired 0.01 kcal/mol threshold, but I think that we can say that the parameters have been converted correctly from the gro topology to the amber prm7 file.

jmichel80 commented 5 years ago

Hi Sofia,

The electrostatic energies deviate more than that. In Sire CLJ denote coulomb+LJ terms. To compare electrostatic energies just use coulomb so - 4076.479568 vs -3554.3019 kj/mol. From the look of it you are still using PME for Gromacs.

Post details about the keywords you use to setup gromacs and somd calculations (show the relevant sections of the mdp and cfg files) and why you think the electrostatic energies between Sire and Gromacs can now be compared. Comparison should be easier if you can setup a non periodic system in Gromacs so you don't have to worry about boxes.

For the small variations with the other terms can @lohedges comment on whether this sounds reasonable upon converting a gro top to an amber prm7 file? I would expect closer agreement unless this can all be explained by rounding off errors upon writing parameters.

Best wishes,

Julien

lohedges commented 5 years ago

Hi both,

There are some unit tests here and here that check the conversion between AMBER and GROMACS format files systems parameterised with AMBER style force fields. Here the agreement in individual force field terms is very good, typically matching to at least three decimal places. Here's part of the stdout from one of the test scripts:

E_{cljff}^{CLJ_{default}}:  -3532.2325848016712  versus  -3532.2325848016712
E_{cljff}^{LJ_{default}}:  641.9466217772684  versus  641.9466217772683
E_{cljff}^{coulomb_{default}}:  -4174.17920657894  versus  -4174.17920657894
E_{intraclj}^{CLJ_{default}}:  -56766.24711387854  versus  -56766.24711387854
E_{intraclj}^{LJ_{default}}:  -0.045650656038560555  versus  -0.045650656038560555
E_{intraclj}^{coulomb_{default}}:  -56766.201463222504  versus  -56766.201463222504
E_{intraff}^{1-4[LJ]}:  -4.372636217382435  versus  -4.372636466304856
E_{intraff}^{1-4[coulomb]}:  -8786.814468124045  versus  -8786.814382603872
E_{intraff}^{1-4}:  -8791.187104341427  versus  -8791.187019070177
E_{intraff}^{angle}:  0.0  versus  0.0
E_{intraff}^{bend-bend}:  0.0  versus  0.0
E_{intraff}^{bond}:  57.172870138128104  versus  57.172886070218034
E_{intraff}^{dihedral}:  311.60081834675276  versus  311.60078863137613
E_{intraff}^{improper}:  0.0  versus  0.0
E_{intraff}^{internal}:  -8422.413415856547  versus  -8422.413344368584
E_{intraff}^{stretch-bend-torsion}:  0.0  versus  0.0
E_{intraff}^{stretch-bend}:  0.0  versus  0.0
E_{intraff}^{stretch-stretch}:  0.0  versus  0.0
E_{intraff}^{urey_bradley}:  0.0  versus  0.0
E_{total}:  -68720.89311453675  versus  -68720.8930430488
lohedges commented 5 years ago

It might be worth using those scripts as the basis of your tests since they are fairly comprehensive, e.g. just update the names of the input files in them.

SofiaBariami commented 5 years ago

Thanks @lohedges . The test helped a lot. These are the results:

Energies:
E_{cljff}^{CLJ_{default}}:  0.0  versus  0.0
E_{cljff}^{LJ_{default}}:  0.0  versus  0.0
E_{cljff}^{coulomb_{default}}:  0.0  versus  0.0
E_{intraclj}^{CLJ_{default}}:  -470.1240501661571  versus  -470.1240501661571
E_{intraclj}^{LJ_{default}}:  54.24805562502297  versus  54.24805562502297
E_{intraclj}^{coulomb_{default}}:  -524.37210579118  versus  -524.37210579118
E_{intraff}^{1-4[LJ]}:  81.3462092868441  versus  81.34633097248468
E_{intraff}^{1-4[coulomb]}:  -432.1495009338833  versus  -432.14950806821474
E_{intraff}^{1-4}:  -350.8032916470392  versus  -350.80317709573006
E_{intraff}^{angle}:  22.065055672389327  versus  22.065956935357228
E_{intraff}^{bend-bend}:  0.0  versus  0.0
E_{intraff}^{bond}:  43.89925328960814  versus  43.89877602151897
E_{intraff}^{dihedral}:  6.4059238324062004  versus  2.6959532561963684
E_{intraff}^{improper}:  0.0  versus  0.0
E_{intraff}^{internal}:  -278.43305885263555  versus  -282.1424908826575
E_{intraff}^{stretch-bend-torsion}:  0.0  versus  0.0
E_{intraff}^{stretch-bend}:  0.0  versus  0.0
E_{intraff}^{stretch-stretch}:  0.0  versus  0.0
E_{intraff}^{urey_bradley}:  0.0  versus  0.0
E_{total}:  -748.5571090187926  versus  -752.2665410488146

The difference is at the dihedrals, probably because of the Ryckaert-Bell functional forms.

lohedges commented 5 years ago

Looking at the Sire GroTop parser, it appears that the dihedrals have already been converted to Fourier series form. (You can see this by printing the dihedral potentials of the loaded molecule too.) Perhaps something has gone wrong in the conversion. When I get a chance I'll take a look to see if I can see if I can figure out the problem.

SofiaBariami commented 3 years ago

Hi @lohedges, do we have any updates at this conversion issue? For a supra-molecular system similar to the one that we were discussing above parameterised with oplsaa, I'm still getting different energies at the dihedrals' component:

E_{cljff}^{coulomb_{default}}:  -4285.729874817867  versus  -4285.729874817867
E_{intraclj}^{CLJ_{default}}:  -625.2446171181026  versus  -625.2446171181026
E_{intraclj}^{LJ_{default}}:  -24.58456319891593  versus  -24.58456319891593
E_{intraclj}^{coulomb_{default}}:  -600.6600539191867  versus  -600.6600539191867
E_{intraff}^{1-4[LJ]}:  257.8395504590764  versus  257.8395512628246
E_{intraff}^{1-4[coulomb]}:  297.32472276315536  versus  297.32472276315536
E_{intraff}^{1-4}:  555.1642732222317  versus  555.1642740259799
E_{intraff}^{angle}:  7559.450773754545  versus  7559.450565986063
E_{intraff}^{bend-bend}:  0.0  versus  0.0
E_{intraff}^{bond}:  88.12330615066351  versus  88.12330615068086
E_{intraff}^{dihedral}:  78.51547305041994  versus  8.953272101907439
E_{intraff}^{improper}:  0.0  versus  0.0
E_{intraff}^{internal}:  8281.25382617786  versus  8211.691418264632
E_{intraff}^{stretch-bend-torsion}:  0.0  versus  0.0
E_{intraff}^{stretch-bend}:  0.0  versus  0.0
E_{intraff}^{stretch-stretch}:  0.0  versus  0.0
E_{intraff}^{urey_bradley}:  0.0  versus  0.0

I also used ubiquitin parameterised both with amber and oplsaa. Amber works fine, but BSS cannot find the dihedrals from the oplsaa FF.

Energies:

E{cljff}^{CLJ{default}}: 0.0 versus 0.0 E{cljff}^{LJ{default}}: 0.0 versus 0.0 E{cljff}^{coulomb{default}}: 0.0 versus 0.0 E{intraclj}^{CLJ{default}}: -2715.672278802128 versus -2715.6722782443703 E{intraclj}^{LJ{default}}: -130.08375685732543 versus -130.08375685732543 E{intraclj}^{coulomb{default}}: -2585.5885219448023 versus -2585.5885213870447 E{intraff}^{1-4[LJ]}: 414.8819774663495 versus 414.8822975546691 E{intraff}^{1-4[coulomb]}: 3336.9409911152948 versus 3336.9411634683966 E{intraff}^{1-4}: 3751.822968581644 versus 3751.8234610230656 E{intraff}^{angle}: 269.9350189516092 versus 269.93519743027974 E{intraff}^{bend-bend}: 0.0 versus 0.0 E{intraff}^{bond}: 1476.639324359713 versus 1476.6384965360305 E{intraff}^{dihedral}: 725.1503088020656 versus 725.1503021316862 E{intraff}^{improper}: 0.0 versus 0.0 E{intraff}^{internal}: 6223.5476206950325 versus 6223.5474571210625 E{intraff}^{stretch-bend-torsion}: 0.0 versus 0.0 E{intraff}^{stretch-bend}: 0.0 versus 0.0 E{intraff}^{stretch-stretch}: 0.0 versus 0.0 E_{intraff}^{ureybradley}: 0.0 versus 0.0 E{total}: 3507.8753418929045 versus 3507.875178876692

-oplsaa:

Exception 'SireIO::parse_error' thrown by the thread 'master'. Could not construct a molecule system from the information stored in this Gromacs topology file. Errors include: Error constructing the molecule associated with template 'Protein_chain_A' : Cannot find the dihedral parameters for the dihedral between atoms GroAtom( name() = HA, number() = 442 )-GroAtom( name() = CA, number() = 441 )-GroAtom( name() = CB, number() = 443 )-GroAtom( name() = HB1, number() = 444 ) (atom types HC-CT_2-CT-HC, function type 3). . . . Cannot find the dihedral parameters for the dihedral between atoms GroAtom( name() = HA, number() = 259 )-GroAtom( name() = CA, number() = 258 )-GroAtom( name() = CB, number() = 260 )-GroAtom( name() = HB1, number() = 261 ) (atom types HC-CT_2-CT-HC, function type 3). Cannot find the dihedral parameters for the dihedral between atoms GroAtom( name() = HA, number() = 816 )-GroAtom( name() = CA, number() = 815 )-GroAtom( name() = CB, number() = 817 )-GroAtom( name() = HB1, number() = 818 ) (atom types HC-CT_2-CT-HC, function type 3). Thrown from FILE: /home/sofia/Desktop/Software/sire_jm_paper/Sire/corelib/src/libs/SireIO/grotop.cpp, LINE: 7280, FUNCTION: virtual SireSystem::System SireIO::GroTop::startSystem(const SireBase::PropertyMap&) const

lohedges commented 3 years ago

Hi @SofiaBariami. Sorry, no progress on this I'm afraid. I'll tag @chryswoods since he wrote the GroTop parser and might have a better idea of its limitations, or where we need to make edits to provide the support that you require. I know we currently only support AMBER style force fields with GROMACS, but the dihedral terms are stored internally as a Fourier series, so supporting Ryckaert-Bellemans should be possible.

jmichel80 commented 3 years ago

We should at the very least update BioSimSpace to flag the conversion was not successful. It is dangerous currently as we end up with seemingly valid input that actually produces largely useless MD trajectories. @SofiaBariami could you look into what is needed to (mathematically) convert between functional forms? It would help think about workarounds to help you progress with your use case.

lohedges commented 3 years ago

Yes, I agree. Although without knowing what is going wrong it's hard to know why, or detect when, the conversion was unsuccessful. At present no warnings or errors are produced from Sire.

jmichel80 commented 3 years ago

Perhaps by adding single point energy consistency checks before and after file conversion ?

lohedges commented 3 years ago

Good point, that should be easy enough to do. I'd still like to understand what's going wrong here so will dig a little deeper. I'll take a look at the dihedral terms for the AMBER and GROMACS systems to see where they differ.

lohedges commented 3 years ago

For starters, having loaded the system and done the conversion I find that there are more dihedral terms in the AMBER molecule than the GROMACS one. There are no impropers in the original molecule, so it's not like the terms are being combined.

import BioSimSpace as BSS

gro_system = BSS.IO.readMolecules(BSS.IO.glob("cage_quin1.*"), property_map={"GROMACS_PATH" : "./top"})
gro_mol  = gro_system[0]

# Convert to AMBER format and read back in.
 BSS.IO.saveMolecules("amber", gro_system, ["rst7", "prm7"])
amb_system = BSS.IO.readMolecules(BSS.IO.glob("amber.*"))
amb_mol = amb_system[0]

# Print the number of dihedrals in each molecule. (Bonds and angles agree.)
print(gro_mol._sire_object.property("dihedral"))
FourAtomFunctions( nFunctions() == 326 )

print(amb_mol._sire_object.property("dihedral"))
FourAtomFunctions( nFunctions() == 370 )

I'm not sure where the extra 44 dihedrals are coming from during the conversion to AMBER format, but it certainly looks like the problem.

More confusingly, when I save this single molecule GROMACS system to file I get a different result on read depending on whether I pass the entire system, or a single molecule to saveMolecules. I've tried a whole bunch of other GROMACS systems and I don't see this behaviour.

# Continuing from above.

# Now save by passing a molecule, which is converted back to a system internally.
BSS.IO.saveMolecules("amber2", gro_mol, ["rst7", "prm7"])
amb_system2 = BSS.IO.readMolecules(BSS.IO.glob("amber2.*"))
amb_mo2l = amb_system2[0]

# How many dihedrals do we have now?
print(amb_mol2._sire_object.property("dihedral"))
FourAtomFunctions( nFunctions() == 85 )

Apparently we only have 85 dihedrals now. The number of bonds and angles also disagree with the original system.

Here's a diff between the two sets of files:

diff amber.rst7 amber2.rst7
1c1
< Gravel Rubs Often Many Awfully Cauterized Sores
---
> BioSimSpace System
80d79
<   16.2340000  18.1760000  18.1760000  90.0000000  90.0000000  90.0000000

diff amber.prm7 amber2.prm7
1c1
< %VERSION  VERSION_STAMP = V0001.000  DATE = 10/28/20  14:47:50
---
> %VERSION  VERSION_STAMP = V0001.000  DATE = 10/28/20  14:47:58
4c4
< Gravel Rubs Often Many Awfully Cauterized Sores
---
> BioSimSpace System
9c9
<        0       0       0       0       0       0       0       1      33       0
---
>        0       0       0       0       0       0       0       0      33       0
763,771d762
< %FLAG SOLVENT_POINTERS
< %FORMAT(10I8)
<        0       1       1
< %FLAG ATOMS_PER_MOLECULE
< %FORMAT(10I8)
<      154
< %FLAG BOX_DIMENSIONS
< %FORMAT(5E16.8)
<   9.00000000E+01  1.62340000E+01  1.81760000E+01  1.81760000E+01

The issue is once again the missing ATOMS_PER_MOLECULE flag, meaning that the second amber system is split on read. I'm really not sure why this isn't written when a single molecule is passed to saveMolecules, since this is converted to a system and contains the same information. I'll open another issue to fix this (in Sire) since this means that we are getting inconsistent read/write behaviour with BioSimSpace.

lohedges commented 3 years ago

Okay, ATOMS_PER_MOLECULE is only written to file if the system has a space property. I'm not sure why this is the case. Easy enough to move it outside the conditional block and fix our issue.

lohedges commented 3 years ago

Same issue with SOLVENT_POINTERS. Do these need to be tied to having a periodic box? We can read the file fine without the box information, as can AMBER.

jmichel80 commented 3 years ago

That could be a workaround for @SofiaBariami if the periodic box details are added again later. @SofiaBariami are you sure the above issue is the same for the inputs you have been recently putting together out of ligpargen generated Gro/Top files?

lohedges commented 3 years ago

Okay, I've fixed the ATOMS_PER_MOLECULE issue in Sire. However, I still don't understand why we end up with more dihedrals following the conversion to AMBER format. Looking into this now.

lohedges commented 3 years ago

On conversion, the AMBER parameters are generated with Sire.MM.AmberParams:

# Continuing from above.
from Sire.MM import AmberParams

amber_params = AmberParams(gro_mol._sire_object)
amber_params.dihedralFunctions()
FourAtomFunctions( nFunctions() == 370 )

Look at amberparams.cpp onwards, we can see that extra null dihedrals can be set under certain conditions. Uncommenting the debug statement, recompiling, and regenerating the parameters gives:

ADDING NULL DIHEDRAL FOR "Dihedral( AtomIdx(145), AtomIdx(143), AtomIdx(142), AtomIdx(152) )"
ADDING NULL DIHEDRAL FOR "Dihedral( AtomIdx(147), AtomIdx(144), AtomIdx(146), AtomIdx(153) )"
ADDING NULL DIHEDRAL FOR "Dihedral( AtomIdx(150), AtomIdx(148), AtomIdx(146), AtomIdx(153) )"
...

Counting the output (truncated above) gives the extra 44 dihedral functions...

What happens if I save the amber_system to GROMACS format, read back in.

BSS.IO.saveMolecules("gromacs", amber_system, ["gro87, "grotop"])
gro_system2 = BSS.IO.readMolecules(BSS.IO.glob("gromacs.*"), property_map={"GROMACS_PATH" : "./top"})

# We now have only 70 dihedral functions!
print(gro_system2[0]._sire_object.property("dihedral"))
FourAtomFunctions( nFunctions() == 70 )

On reconversion to GROMACS we have now lost dihedral information! The number of bonds and angles are consistent across all systems. (On re-conversion to AMBER we gain a bunch more null dihedrals taking us back to 344 total, and so on.) There's definitely something funky going on with the dihedral conversion.

lohedges commented 3 years ago

On further investigation, simply saving the originally loaded GROMACS system back to disk in the same format and re-reading loses dihedral information, i.e.:

import BioSimSpace as BSS

# Load the original system and check the number of dihedrals.
gro_system = BSS.IO.readMolecules(BSS.IO.glob("cage_quin1.*"), property_map={"GROMACS_PATH" : "./top"})
print(gro_system[0]._sire_object.property("dihedral"))
FourAtomFunctions( nFunctions() == 326 )

# Convert back to GROMACS format, load back in, then check the number of dihedrals again.
BSS.IO.saveMolecules("gromacs", gro_system, ["grotop", "gro87"])
gro_system2 = BSS.IO.readMolecules(BSS.IO.glob("gromacs.*"), property_map={"GROMACS_PATH" : "./top"})
print(gro_system2[0]._sire_object.property("dihedral"))
FourAtomFunctions( nFunctions() == 70 )

This implies that the issue isn't with the conversion to AMBER format, rather with the original GROMACS system itself.

lohedges commented 3 years ago

Right, making progress with this. Converting the expressions for the original dihedral potentials back to Sire.MM.GromacsDihedral objects shows clear differences:

from Sire.MM import GromacsDihedral

for p in gro_system[0]._sire_object.property("dihedral").potentials():
    print("%s --> %s" % (p.function(),
        GromacsDihedral(p.function(), Symbol("phi")).toExpression(Symbol("phi"))))

-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-1.874 cos(phi - 3.14159)^2 + 1.874 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
1.1 [cos(2 phi - 3.14159) + 1] --> 1.1 cos(2 phi - 3.14159) + 1.1
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
1.18351 cos(phi - 3.14159) - 2.919 cos(phi - 3.14159)^2 + 1.73549 --> 1.18351 cos(phi - 3.14159) + 1.18351
-1.874 cos(phi - 3.14159)^2 + 1.874 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-2 cos(phi - 3.14159)^2 + 2 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
1.1 [cos(2 phi - 3.14159) + 1] --> 1.1 cos(2 phi - 3.14159) + 1.1
10.5 [cos(2 phi - 3.14159) + 1] --> 10.5 cos(2 phi - 3.14159) + 10.5
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-7.25 cos(phi - 3.14159)^2 + 7.25 --> 0
-14 cos(phi - 3.14159)^2 + 14 --> 0
...

The exact same thing is seen for conversion to Sire.MM.AmberDihedral too. It looks like many expressions are not converting correctly, being set to zero in the process. This seems strange since GromacsDihedral is used on read and write, so if it successfully constructed an expression on read (from info in the topology) it should be able to convert back to the same thing on write.

lohedges commented 3 years ago

Getting to the bottom of this...

The problematic code is in the constructor of GromacsDihedral in gromacsparams.cpp. On read, the dihedral potentials are correctly identified as being Ryckaert-Bellemans format (func_type = 3) and are set correctly. On write, the expression is cast as an AmberDihedral and the func_type set to either 1 or 9 before constructing the GromacsDihedral object. I've tried manually setting the func_type to 3 to see what happens, but I still see the same output, which implies that the cast the AmberDihedral isn't giving the correct result, irrespective of the incorrect func_type. (This also explains why I see the same thing above using AmberDihedral or GromacsDihedral.)

@chryswoods: Do you have any comments or suggestions? Clearly the GroTop parser is currently read-write inconsistent. If we only expect to be able to read certain dihedral functional forms, then perhaps we need to add a property to the system/molecule to flag that it can't safely be written back to file or converted to another format.

lohedges commented 3 years ago

Also, according to GromacsDihedral::isCosine(), the Ryckaert-Bellemans func_type should return false (see here). However, since the cast to AmberDihedral is successful it is being picked up as a cosine dihedral and the error at the bottom of the GromacsDihedral from expression constructor is not being triggered:

 throw SireError::incomplete_code( QObject::tr("Sire cannot yet interpret dihedrals "
       "that are not in a standard cosine format! (%1)").arg(dihedral.toString()), CODELOC );
SofiaBariami commented 3 years ago

Hi everybody, thank you for looking into that. I run some tests just with the guest molecule (not the one from the input files that are uploaded here) and found a way around the issue using parmed for the files' conversion. BSS is definitely not handling the dihedrals correctly (Regarding the conversion, check page 80 of the gromacs manual). For example, see here a solvated system parameterised with oplsaa using Macromodel:

from parmed.gromacs import GROMACS_TOPDIR
GROMACS_TOPDIR = "/home/sofia/Desktop/Sofia/cages/LigParGen_topol/top" 
import parmed as pmd 

#get the parmed files
gmx_top = pmd.load_file('topol.top',xyz='npt.gro') 
gmx_top.save('pmd.prm7', format='amber') 
gmx_top.save('pmd.rst7', format='rst7’)

#get the BSS files: 
import BioSimSpace as BSS
BSS.setVerbose(True)
files = ["npt.gro","topol.top"]
system = BSS.IO.readMolecules(files, property_map={"GROMACS_PATH" : "/home/sofia/Desktop/Sofia/cages/LigParGen_topol/top"})
BSS.IO.saveMolecules("bss", system, ["rst7", “prm7"])

In [2]: bss_system = BSS.IO.readMolecules(BSS.IO.glob("bss.*"))                                                

In [3]: pmd_system = BSS.IO.readMolecules(BSS.IO.glob("pmd.*"))                                                

In [4]: bss_mol = bss_system[0]                                                                                

In [5]: pmd_mol = pmd_system[0]                                                                               

In [7]: pmd_mol._sire_object.property("dihedral")                                                              
Out[7]: FourAtomFunctions( nFunctions() == 126 )

In [8]: bss_mol._sire_object.property("dihedral")                                                              
Out[8]: FourAtomFunctions( nFunctions() == 1234 )

With the LigParGen parameterisation I am using this script to convert the xml/pdb files to prm/rst:

In [1]:import BioSimSpace as BSS                                                                              
In [4]:amber_system = BSS.IO.readMolecules(BSS.IO.glob("g4q.*"))      
In [5]:amber_mol = amber_system[0]                                                                            

In [6]: amber_mol._sire_object.property("dihedral")                                                            
Out[6]: FourAtomFunctions( nFunctions() == 104 )

Through LigParGen we can also get gromacs files (gro/itp), but BSS cannot read them successfully, because the specific itp parameters are not included in the gromacs topology folder:

Exception 'SireError::io_error' thrown by the thread 'master'.
There is not enough information in this parser (Gro87( title() = LIGPARGEN GENERATED GRO FILE, nAtoms() = 36, nResidues() = 1, nFrames() = 1, hasCoordinates() = 1, hasVelocities() = 0 )) to start the creation of a new System. You need to use a more detailed input file.

I also run some short MD simulations with the ligand bound to the cage and files from bss (left) and parmed (right). We see that the correct geometry is maintained using the parmed files.

Screenshot 2020-10-29 at 10 02 06

Finally, running an FEP with the parmed files also seems fine. I'm getting a binding free energy of -10.64 kcal/mol, that is a huge improvement compared to the 127.48 kcal/mol that I was getting with the files converted with BSS.

To wrap things up, for now, I'll use parmed for the files' conversion, instead of BSS until the issue is resolved. Also, I'd be happy to help with the dihedrals conversion or with anything else regarding this issue, so please let me know if you need anything from me.

lohedges commented 3 years ago

Right, I've added code to the Sire.IO.GroTop parser so that we can now introspect Ryckaert-Bellemans dihedral functions from a Sire.CAS.Expression. This means that we are now read-write consistent for the original files that you provided, e.g.:

import BioSimSpace as BSS

gro_system = BSS.IO.readMolecules(BSS.IO.glob("cage_quin1.*"), property_map={"GROMACS_PATH" : "./top"})
BSS.IO.saveMolecules("gromacs", gro_system, ["gro87", "grotop"], property_map={"GROMACS_PATH" : "./top"})
gro_system2 = BSS.IO.readMolecules(BSS.IO.glob("gromacs.*"), property_map={"GROMACS_PATH" : "./top"})

gro_system[0]._sire_object.property("dihedral")
FourAtomFunctions( nFunctions() == 326 )

gro_system2[0]._sire_object.property("dihedral")
FourAtomFunctions( nFunctions() == 326 )

More specifically...

from Sire.CAS import Symbol
from Sire.IO import GroTop
from Sire.MM import GromacsDihedral

# Load the topology.
gro_top = GroTop("cage_quin1.top", {"GROMACS_PATH" : "./top"})

# Extract a specific dihedral function.
dih = gro_top.dihedralPotentials()['c2v_C12;c2v_C9;c2v_C7;c2v_C5;3'].toExpression(Symbol("phi"))
print(dih)
-7.25 cos(phi - 3.14159)^2 + 7.25

# Now re-convert the expression to a GromacsDihedral.
dih2 = GromacsDihedral(dih, Symbol("phi"))

print(dih2)
GromacsDihedral( functionType() = Ryckaert-Bellemans dihedral, parameters() = [ 30.334, 0, -30.334, 0, 0, 0 ] )

print(dih2.toExpression(Symbol("phi")))
-7.25 cos(phi - 3.14159)^2 + 7.25

(Before it was incorrectly being parsed as a null AmberDihedral.)

This still doesn't solve the issue of incorrect conversion to AMBER format. If this isn't possible, then we should add checks to Sire.MM.AmberDihedral to make sure the attempted conversion raises an error.

jmichel80 commented 3 years ago

Through LigParGen we can also get gromacs files (gro/itp), but BSS cannot read them successfully, because the specific itp parameters are not included in the gromacs topology folder:

Can't you path the path to the ipt files as as keyword argument to BSS.IO.readMolecules ? I fail to understand why it shouldn't be possible for BSS to read such inputs. if that's really the case upload examples for @lohedges to take a look.

Finally, running an FEP with the parmed files also seems fine. I'm getting a binding free energy of -10.64 kcal/mol, that is a huge improvement compared to the 127.48 kcal/mol that I was getting with the files converted with BSS.

Much better indeed ! You are definitely on the right track.

To wrap things up, for now, I'll use parmed for the files' conversion, instead of BSS until the issue is resolved.

@SofiaBariami there is a risk that the parmed conversion is incorrect, and the results wrong (just not crazy wrong). Can you check for consistency of single point diheral energies between the original grop/top and the final parm/rst7 files. Does parmed do something clever to convert Gromacs dihedrals to Amber dihedrals ? Why the issues @lohedges is documenting above would not apply to parmed?

lohedges commented 3 years ago

I think I've come up with a solution. Teaching this afternoon so should be done on Monday. Either way we'll have an AMBER-conversion, or it will throw an error if not possible. (Rather than converting to nonsense.)

lohedges commented 3 years ago

The conversion to AMBER format works fine (following the formulae in the GROMACS manual, and converting energy units). I'll add this to the Sire.MM.AmberDihedral code, along with some tests. Once it's pushed (probably Monday), could @SofiaBariami test with the problematic files to make sure that things work as expected.

(The easiest thing would be to convert all dihedral potentials to the Fourier representation on read, but it is nice to preserve the Ryckaert-Bellemans representation for GROMACS so that we write out files that look identical to the ones read in.)

SofiaBariami commented 3 years ago

Thank you @lohedges! Can you please let me know when the new version is ready to run the tests?

SofiaBariami commented 3 years ago

@SofiaBariami there is a risk that the parmed conversion is incorrect, and the results wrong (just not crazy wrong). Can you check for consistency of single point diheral energies between the original grop/top and the final parm/rst7 files.

So dihedral energies look fine with the parmed conversion.

Energies:
E_{cljff}^{CLJ_{default}}:  nan  versus  nan
E_{cljff}^{LJ_{default}}:  nan  versus  nan
E_{cljff}^{coulomb_{default}}:  -4280.458056332074  versus  -4280.458056332081
E_{intraclj}^{CLJ_{default}}:  -27.945713048021616  versus  -27.945713048021616
E_{intraclj}^{LJ_{default}}:  -2.534925237290736  versus  -2.534925237290736
E_{intraclj}^{coulomb_{default}}:  -25.41078781073088  versus  -25.41078781073088
E_{intraff}^{1-4[LJ]}:  31.303943757018793  versus  31.303943819681848
E_{intraff}^{1-4[coulomb]}:  60.559117689843205  versus  60.559117689843255
E_{intraff}^{1-4}:  91.86306144686199  versus  91.8630615095251
E_{intraff}^{angle}:  7377.223079475931  versus  7377.222863424108
E_{intraff}^{bend-bend}:  0.0  versus  0.0
E_{intraff}^{bond}:  2741.1237125586667  versus  2741.123712558653
E_{intraff}^{dihedral}:  6.957608930712654  versus  6.957608926823065
E_{intraff}^{improper}:  0.0  versus  0.0
E_{intraff}^{internal}:  10217.167462412172  versus  10217.167246419109
E_{intraff}^{stretch-bend-torsion}:  0.0  versus  0.0
E_{intraff}^{stretch-bend}:  0.0  versus  0.0
E_{intraff}^{stretch-stretch}:  0.0  versus  0.0
E_{intraff}^{urey_bradley}:  0.0  versus  0.0
E_{total}:  nan  versus  nan

Does parmed do something clever to convert Gromacs dihedrals to Amber dihedrals ? Why the issues @lohedges is documenting above would not apply to parmed?

Parmed apparently solved the issue as we see here.

And if I may quote the InterMol people from here:

"We can fit everything into two types of dihedrals - dihedral_trig, and improper harmonic. Dihedral trig is of the form fc0 + sum_i=1^6 fci (cos(nx-phi) Proper dihedrals can be stored easily in this form, since they have only 1 n. Improper dihedrals can as well (flag as improper). RB can be stored as well, assuming phi = 0 or 180. Fourier can also be stored. A full dihedral trig can be decomposed into multiple proper dihedrals. Will need to handle multiple dihedrals little differently in that we will need to add multiple 9 dihedrals together into a single dihedral_trig, as long as they have the same phi angle (seems to be always the case)."

lohedges commented 3 years ago

This is now working. I've added a unit test (referenced above) and pushed changes to the devel branch of Sire. An updated conda package should be available in a couple of hours. If you could test and let me know if you come across any issues.

Cheers.

lohedges commented 3 years ago

Also, thanks for the InterMol and Parmed links. Those will come in useful if we have more issues in future, e.g. for terms other than dihedrals. We have an advantage in that we already store things internally as an computer algebra expression, so as long as we know how to detect and convert between things we should be okay. (Rather than storing a different object for each type of potential.)

lohedges commented 3 years ago

Closing since I believe this is now fixed. If you're still experiencing issues then please re-open and I'll investigate further.