michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Files conversion error #152

Closed SofiaBariami closed 4 years ago

SofiaBariami commented 4 years ago

Hi, I am running on Linux the latest devel version of BioSimSpace and I am trying to convert GROMACS into Amber Files. The gromacs topology folder that I'm using is a custom one, so I need to define GROMACS_PATH. This is the code that I'm using:

~/biosimspace_devel.app/bin/ipython
import BioSimSpace as BSS
files = [“equil.gro”,“topol.top”]
system = BSS.IO.readMolecules(files, property_map={“GROMACS_PATH” : “/path/to/folder/“})
BSS.IO.saveMolecules(“equil”, system, [“rst7", “prm7”])

and the error is:

~/biosimspace_devel.app/lib/python3.7/site-packages/BioSimSpace-2019.1.0+211.ga01cf1d-py3.7.egg/BioSimSpace/IO/_io.py in readMolecules(files, property_map)
raise IOError(msg) from None
else:
raise IOError(“Failed to read molecules from: %s” % files) from None
return _System(system)
OSError: Failed to read molecules from: [‘equi2.gro’, ‘topol.top’]

Using parmed I also got an error:

from parmed.gromacs import GROMACS_TOPDIR
GROMACS_TOPDIR = “/path/to/top/”

import parmed as pmd
gmx_top = pmd.load_file(‘topol.top’,xyz=‘minim.gro’)
gmx_top.save(‘G4.prm7’, format=‘amber’)
gmx_top.save(‘G4.rst7’, format=‘rst7’)

GromacsWarning: pairs funct != 1; unknown functional. Cannot instantiate an AmberParm from unknown functional

I am attaching the topology and coordinates of the system. Any help would be more than welcome.

lohedges commented 4 years ago

Hi @SofiaBariami. Remember you can use BSS.setVerbose(True) to get the full Sire error trace. Unless your files have the wrong name, there might be a typo in the above examples. This is what I get:

import BioSimSpace as BSS
BSS.setVerbose(True)
files = ["equi2.gro", "topol.top"]
system = BSS.IO.readMolecules(files, property_map={"GROMACS_PATH" : "amber99sbmut.ff"})

Error trace:

UserWarning: Exception 'SireError::io_error' thrown by the thread 'master'.
There is not enough information in this parser (Gro87( title() = Protein in water, nAtoms() = 65744, nResidues() = 21099, nFrames() = 1, hasCoordinates() = 1, hasVelocities() = 1 )) to start the creation of a new System. You need to use a more detailed input file.

Given that ParmEd also fails, I'm not sure if this is a bug with BioSimSpace/Sire.

lohedges commented 4 years ago

Out of interest, does GROMACS load the input files correctly, i.e. can you run a simulation using them?

SofiaBariami commented 4 years ago

Hi Lester, for my BSS 2019.1.0 I get a module 'BioSimSpace' has no attribute 'setVerbose' error, that's why I can't set it to on. Yes, the GROMACS simulations run fine. So do you think it's a parameterisation error?

lohedges commented 4 years ago

Yes, it wasn't part of the 2019.1.0 release. 2019.3.0 is the latest. I'd suggest upgrading to that, or using the development version. There have been lots of performance improvements and bug-fixes since 2019.1.0.

lohedges commented 4 years ago

Yes, the GROMACS simulations run fine. So do you think it's a parameterisation error

I'm not sure without know more about what you're doing. It might be some specific GROMACS terms that we currently don't support. To start with we focused on AMBER-style force fields (to aid with AMBER/GROMACS conversion) but we might not cover everything. I imagine @chryswoods would have a better idea of any limitations of the current GROMACS parser.

SofiaBariami commented 4 years ago

there are some sugars parameterised with GLYCAM, so this might cause the issue

jmichel80 commented 4 years ago

I don't have an answer yet, but it would be good to get to the root of the matter, and think how we can by default give more explicit error messages. This particular issue is slowing down our COVID research efforts and it would be great to find a workaround, even if that has to be a temporary kludge. for the time being. Since the input is a gromacs top parameterised with a panel of Amber forcefields, the expectation is that a valid input could be produced to run with pmemd and the likes.


Dr. Julien Michel, Senior Lecturer Room 263, School of Chemistry University of Edinburgh David Brewster road Edinburgh, EH9 3FJ United Kingdom phone: +44 (0)131 650 4797 http://www.julienmichel.net/

On Fri, May 15, 2020 at 2:32 PM Sofia Bariami notifications@github.com<mailto:notifications@github.com> wrote:

there are some sugars parameterised with GLYCAM, so this might cause the issue

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/152#issuecomment-629238012, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACZN3ZF2VF42X6V67S43AP3RRU77LANCNFSM4NBR5IHA.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

lohedges commented 4 years ago

I'd suggest using setVerbose(True) as a start. As you know, the main issue is that the Sire error messages aren't always super consistent, so a similar error in terms of IO might give a rather different message depending on the parser. Also, interpreting and fixing the messages (if it is a Sire problem rather than the BioSimSpace layer) takes pretty detailed knowledge of Sire, so isn't always that helpful to the user.

My plan was to try and categorise the various error messages, then catch them in BioSimSpace and apply a general error overview as well as the detailed error trace. However, this requires me triggering all of the various errors, as well as understanding what the output means, which is easier said than done.

SofiaBariami commented 4 years ago

Hi again! Just to close this issue, the conversion worked after including some parameters at the gromacs topology folder. However when comparing the single point energies between the molecule using gromacs and amber they do not agree. The forcefield that the molecule is parameterised is amber99SB with some modifications. I thought that the energies disagreement could be due to the modified forcefield, so I tried ubiquitin, parameterised with amber99SB and amber03 and again converted the files and compared it with the SOMD energies. There were differences there as well:

Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Bond                        535.032         --          0          0  (kJ/mol)
Angle                       1129.41         --          0          0  (kJ/mol)
Proper Dih.                 3003.09         --          0          0  (kJ/mol)
Improper Dih.               30.9367         --          0          0  (kJ/mol)
LJ-14                       1735.87         --          0          0  (kJ/mol)
Coulomb-14                  13961.2         --          0          0  (kJ/mol)
LJ (SR)                     46.9251         --          0          0  (kJ/mol)
Disper. corr.              -85.8335         --          0          0  (kJ/mol)
Coulomb (SR)               -24637.9         --          0          0  (kJ/mol)
Coul. recip.                2258.59         --          0          0  (kJ/mol)
Potential                  -2022.72         --          0          0  (kJ/mol)
Kinetic En.                 8454.34         --          0          0  (kJ/mol)
Total Energy                6431.62         --          0          0  (kJ/mol)

E{molecules-intrabonded}^{angle} == 269.935, E{molecules-intrabonded}^{bond} == 1476.64 E{molecules-intrabonded}^{dihedral} == 721.136,
E
{molecules-intrabonded}^{internal} == 2467.71, E{molecules-intranonbonded}^{CLJ} == 419.198 E{molecules-intranonbonded}^{LJ} == 394.881, E{molecules-intranonbonded}^{coulomb} == 24.3171,
E
{total} == 2886.91

I was expecting the energies to agree since the gromacs files are parameterised with amber forcefields, but we can see that even the bonded energies do not match. Does anyone have any comments on this? Also how exactly `BSS.saveMolecules()` work? The code that I used is here:

~/biosimspace_devel.app/bin/ipython import BioSimSpace as BSS files = ["ubq_amber.gro","topol_amber.top"] system = BSS.IO.readMolecules(files}) BSS.IO.saveMolecules("ubq_amber99SB", system, ["rst7", "prm7"])


 and all the files that I used are here: https://github.com/SofiaBariami/sharing_files/tree/master/ubq

Thank you very much. 
lohedges commented 4 years ago

I'm not sure about the energy disagreement (will take a look when I get a chance) but what exactly are you wondering about the saveMolecules function? Documentation is here. In your example you are saving the system loaded from files to rst7 and prm7 format.

jmichel80 commented 4 years ago

Hi,

I do not expect the non-bonded energies to agree as the Gromacs run used PME, whereas Sire/SOMD did not. The bonded terms should be the same though. @lohedges it would be great if you can take a look. Sofia is working with a a range of systems of varying complexity that may well reveal some edge cases.

Best wishes,

Julien


Dr. Julien Michel, Senior Lecturer Room 263, School of Chemistry University of Edinburgh David Brewster road Edinburgh, EH9 3FJ United Kingdom phone: +44 (0)131 650 4797 http://www.julienmichel.net/

On Mon, Jun 1, 2020 at 2:52 PM Lester Hedges notifications@github.com<mailto:notifications@github.com> wrote:

I'm not sure about the energy disagreement (will take a look when I get a chance) but what exactly are you wondering about the saveMolecules function? Documentation is herehttps://biosimspace.org/api/generated/BioSimSpace.IO.saveMolecules.html#BioSimSpace.IO.saveMolecules. In your example you are saving the system loaded from files to rst7 and prm7 format.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/152#issuecomment-636872852, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACZN3ZFO6YT4OTKQSN4RD7TRUOXABANCNFSM4NBR5IHA.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

lohedges commented 4 years ago

Reading your message again, I'm a bit confused as to what you are showing. You seem to imply that you see different energies with AMBER and GROMACS for the same force field. However, the two sets of energies are labeled with different force fields, amber99SB and amber03. Were these both computed with the same MD engine?

I downloaded your files and tried running the calc_energies.py script in the AMBER directory. This fails with the following (ridiculously informative) error:

Traceback (most recent call last):
  File "calc_energies.py", line 324, in <module>
    (molecules, space) = amber.readCrdTop("unb_amber99SB.rst7","unb_amber99SB.prm7")
UserWarning: Exception 'SireError::file_error' thrown by the thread 'master'.
No error occurred while reading "unb_amber99SB.prm7".
Thrown from FILE: /home/lester/Code/Sire/corelib/src/libs/SireIO/amber.cpp, LINE: 1655, FUNCTION: boost::tuples::tuple<SireMol::MoleculeGroup, SireBase::PropPtr<SireVol::Space> > SireIO::Amber::readCrdTop(const QString&, const QString&, QString) const
__Backtrace__
(  0) /home/lester/sire.app/lib/libSireError.so.2019 ([0x7f8395247c1e] ++0x2e)
  -- SireError::getBackTrace()

(  1) /home/lester/sire.app/lib/libSireError.so.2019 ([0x7f8395244a0e] ++0x7e)
  -- SireError::exception::exception(QString, QString)

/home/lester/sire.app/lib/libSireIO.so.2019(+0xa492f) [0x7f8378fe192f]
(  3) /home/lester/sire.app/lib/libSireIO.so.2019 ([0x7f8378fdb06c] ++0x817c)
  -- SireIO::Amber::readCrdTop(QString const&, QString const&, QString) const

/home/lester/sire.app/lib/python3.7/site-packages/Sire/IO/_IO.so(+0xdb40f) [0x7f837937140f]
/home/lester/sire.app/lib/python3.7/site-packages/Sire/IO/_IO.so(+0xdb53c) [0x7f837937153c]
(  6) /home/lester/sire.app/lib/libboost_python37.so.1.70.0 ([0x7f83973a04cd] ++0x29d)
  -- boost::python::objects::function::call(_object*, _object*) const

/home/lester/sire.app/lib/libboost_python37.so.1.70.0(+0x1f639) [0x7f83973a0639]
(  8) /home/lester/sire.app/lib/libboost_python37.so.1.70.0 ([0x7f83973a730b] ++0x6b)
  -- boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const

/home/lester/sire.app/lib/python3.7/site-packages/Sire/Error/_Error.so(+0x2a9e) [0x7f8397987a9e]
( 10) /home/lester/sire.app/lib/libboost_python37.so.1.70.0 ([0x7f83973a72da] ++0x3a)
  -- boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const

/home/lester/sire.app/lib/python3.7/site-packages/Sire/Error/_Error.so(+0x2a5c) [0x7f8397987a5c]
( 12) /home/lester/sire.app/lib/libboost_python37.so.1.70.0 ([0x7f83973a708f] ++0x3f)
  -- boost::python::handle_exception_impl(boost::function0<void>)

/home/lester/sire.app/lib/libboost_python37.so.1.70.0(+0x1c6fa) [0x7f839739d6fa]
( 14) /home/lester/sire.app/bin/python ([0x5624ca4ddd2b] ++0x49b)
  -- _PyObject_FastCallKeywords

( 15) /home/lester/sire.app/bin/python ([0x5624ca5397ae] ++0x537e)
  -- _PyEval_EvalFrameDefault

( 16) /home/lester/sire.app/bin/python ([0x5624ca47a4f9] ++0x2f9)
  -- _PyEval_EvalCodeWithName

( 17) /home/lester/sire.app/bin/python ([0x5624ca47b3c4] ++0x44)
  -- PyEval_EvalCodeEx

( 18) /home/lester/sire.app/bin/python ([0x5624ca47b3ec] ++0x1c)
  -- PyEval_EvalCode

/home/lester/sire.app/bin/python(+0x22f874) [0x5624ca593874]
( 20) /home/lester/sire.app/bin/python ([0x5624ca59db81] ++0xa1)
  -- PyRun_FileExFlags

( 21) /home/lester/sire.app/bin/python ([0x5624ca59dd73] ++0x1c3)
  -- PyRun_SimpleFileExFlags

/home/lester/sire.app/bin/python(+0x23ae5f) [0x5624ca59ee5f]
( 23) /home/lester/sire.app/bin/python ([0x5624ca59ef7c] ++0x3c)
  -- _Py_UnixMain

( 24) /usr/lib/libc.so.6 ([0x7f8397ef8002] ++0xf2)
  -- __libc_start_main

/home/lester/sire.app/bin/python(+0x1e0122) [0x5624ca544122]
__EndTrace__
Exception 'SireError::file_error' thrown by the thread 'master'.
No error occurred while reading "unb_amber99SB.prm7".
Thrown from FILE: /home/lester/Code/Sire/corelib/src/libs/SireIO/amber.cpp, LINE: 1655, FUNCTION: boost::tuples::tuple<SireMol::MoleculeGroup, SireBase::PropPtr<SireVol::Space> > SireIO::Amber::readCrdTop(const QString&, const QString&, QString) const

No error occurred, apparently?

lohedges commented 4 years ago

For reference, BioSimSpace can read the files just fine (using the new AMBER parsers).

import BioSimSpace as BSS
system = BSS.IO.readMolecules(BSS.IO.glob("ubq_*"))
print(system)
<BioSimSpace.System: nMolecules=1>
jmichel80 commented 4 years ago

We don't actually use SOMD in this project so @SofiaBariami can you compare a single point energy calculated between gromacs and between sander ?

@lohedges we are still using the old AMBER parsers in SOMD unfortunately, we haven't found the time and resources yet to port to the new parsers. There is some work to do to adapt the SIreMol::amberparameters class if I recall well. It would be great to deprecate the old parsers but that is for another thread I guess.

SofiaBariami commented 4 years ago

I've tested two systems. One tripeptide (TYR-ALA-TYR) and ubiquitin. Both of them are in vacuum. The conversion from gromacs to amber files was done both with BioSimSpace and ParmEd. The energies of the molecules that are generated with these two methods agree (vary small differences at the elecrostatics ~0.1 kcal/mol). What I noticed is a disagreement at the energies of the bonded interactions between gromacs and amber. At smaller systems the difference is ~1.47 kcal/mol. However, when we are scaling up to a protein for example, the energies of the bonded interactions differ a lot (~1348.76).

--> Sander: BSS files:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       5.7601E+02     6.2684E+01     1.5959E+03     O        1173
 BOND    =     1476.6393  ANGLE   =      269.9350  DIHED      =      721.1363
 VDWAALS =     -139.8939  EEL     =    -5276.6718  HBOND      =        0.0000
 1-4 VDW =      414.8820  1-4 EEL =     3109.9821  RESTRAINT  =        0.0000

ParmEd files:

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       5.7588E+02     6.2684E+01     1.5959E+03     O        1173
 BOND    =     1476.6393  ANGLE   =      269.9350  DIHED      =      721.1363
 VDWAALS =     -139.8939  EEL     =    -5276.6718  HBOND      =        0.0000
 1-4 VDW =      414.8820  1-4 EEL =     3109.8577  RESTRAINT  =        0.0000
lohedges commented 4 years ago

Hi @SofiaBariami. Thanks for the update, that's interesting. It's good to know that BioSimSpace and ParmEd are pretty much in agreement for the AMBER case.

Is it possible to try the conversion the other way to see what you get, i.e. take the AMBER files that you converted to and re-convert them to GROMACS format. (Alternatively, use some AMBER files from elsewhere and do the conversion.) It would be interesting to see if you recover the original bonded energy in GROMACS, or whether the conversion from GROMACS to AMBER has changed something. All of the BioSImSpace/Sire parsers should be symmetric.

lohedges commented 4 years ago

Sorry, since you get the same AMBER result with BioSimSpace and ParmEd then it seems that the GROMACS to AMBER conversion must be okay. (Unless there is an issue with BioSimSpace and PARMED.)

Do you have the input files and config files used to perform the calculations with GROMACS and SANDER?

SofiaBariami commented 4 years ago

Thanks for looking into this Lester. Here are the files both for amber and gromacs calculations. For the record, the inverse conversion to gromacs gives the same energies as the initial files.

lohedges commented 4 years ago

Hi @SofiaBariami, I've run single-point energy calculations locally using BioSimSpace and get near perfect agreement for all terms between AMBER and GROMACS. To do this I had to set a custom config for GROMACS to allow zero minimisation steps, and also add -ntomp 1 -ntmpi 1 to the command-line flags to stop domain decomposition, which doesn't work with vacuum simulations. (I'll try to add some checks to do this automatically.) To reproduce what I did, starting from you ubiqutin directory:

import BioSImSpace as BSS

# Load the molecular system. Here I use the GROMACS files, since I assume
# that the conversion is working okay.
system = BSS.IO.readMolecules(["gromacs/gromacs.gro", "gromacs/gromacs.top"])

# Set up a minimisation protocol with one step.
protocol = BSS.Protocol.Minimisation(steps=1)

# Create the GROMACS and AMBER processes.
process_gmx = BSS.Process.Gromacs(system, protocol, work_dir="bss_gmx")
process_amber = BSS.Process.Amber(system, protocol, work_dir="bss_amber")

# Modify the config for GROMACS to allow zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)

# Make sure GROMACS only uses a single thread.
process_gmx.addArg("-ntomp", 1)
process_gmx.addArg("-ntmpi", 1)

# Run the GROMACS process from within BioSimSpace.
process_gmx.start()
process_gmx.wait()

# Run the AMBER process from within BioSimSpace.
process_amber.start()
process_amber.wait()

Now compare the energies:

bss_gmx/gromacs.log

   Energies (kJ/mol)
           Bond          Angle    Proper Dih.          LJ-14     Coulomb-14
    6.17826e+03    1.12941e+03    3.01723e+03    1.73587e+03    1.30126e+04
        LJ (SR)   Coulomb (SR)      Potential Pressure (bar)
   -6.15216e+02   -2.30281e+04    1.43010e+03    0.00000e+00

bss_amber/amber.nrg

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       3.4189E+02     6.2679E+01     1.5939E+03     O        1173

 BOND    =     1476.6393  ANGLE   =      269.9350  DIHED      =      721.1363
 VDWAALS =     -147.0429  EEL     =    -5503.6427  HBOND      =        0.0000
 1-4 VDW =      414.8820  1-4 EEL =     3109.9821  RESTRAINT  =        0.0000

Comparing the GROMACS energies to yours, the only term that is different is BOND, where yours is 5.35032e+02 kJ/mol. Converting mine to kcal/mol gives...

(6.17826e+03 * BSS.Units.Energy.kj_per_mol).kcal_per_mol()
1476.6396 kcal/mol

This is in near exact agreement with the AMBER result.

I've not looked too closely, but I can only assume that there is an inconsistency in the way that you are performing the calculation between AMBER and GROMACS, which will probably be uncovered by comparing the configuration files.

I'm now confident that the conversions are work correctly, which was the original reason for this issue. Please feel free to close when you are satisfied.

lohedges commented 4 years ago

FYI: I've added a quick change to catch vacuum simulations in GROMACS and run in single-threaded mode. Now you can compare results directly from within BioSimSpace as follows:

from pytest import approx

import BioSImSpace as BSS

# Load the molecular system. Here I use the GROMACS files, since I assume
# that the conversion is working okay.
system = BSS.IO.readMolecules(["gromacs/gromacs.gro", "gromacs/gromacs.top"])

# Set up a minimisation protocol with one step.
protocol = BSS.Protocol.Minimisation(steps=1)

# Create the GROMACS and AMBER processes.
process_gmx = BSS.Process.Gromacs(system, protocol, work_dir="bss_gmx")
process_amber = BSS.Process.Amber(system, protocol, work_dir="bss_amber")

# Modify the config for GROMACS to allow zero steps.
config = process_gmx.getConfig()
config[1] = "nsteps = 0"
process_gmx.setConfig(config)

# Run the GROMACS process.
process_gmx.start()
process_gmx.wait()

# Run the AMBER process.
process_amber.start()
process_amber.wait()

# Get the bond energies. (In kcal / mol.)
bond_gmx = process_gmx.getBondEnergy().kcal_per_mol()
bond_amber = process_amber.getBondEnergy()

# Print the bond energies.
print(f"GROMACS: {bond_gmx}, AMBER: {bond_amber}")
GROMACS: 1476.6396 kcal/mol, AMBER: 1476.6393 kcal/mol

# Assert that the energies are equal (within tolerance).
assert bond_gmx.magnitude() == approx(bond_amber.magnitude())
lohedges commented 4 years ago

I've also added a unit test which you can run locally, e.g. for my development environment:

PYTHONPATH=/home/lester/Code/BioSimSpace/python ~/sire.app/bin/pytest -vvv test/Process/test_single_point_energy.py
jmichel80 commented 4 years ago

brilliant @lohedges ! @SofiaBariamihttps://github.com/SofiaBariami can you confirm if this also fixes all issues related to the Spike protein ?


Dr. Julien Michel, Senior Lecturer Room 263, School of Chemistry University of Edinburgh David Brewster road Edinburgh, EH9 3FJ United Kingdom phone: +44 (0)131 650 4797 http://www.julienmichel.net/

On Mon, Jun 8, 2020 at 1:04 PM Lester Hedges notifications@github.com<mailto:notifications@github.com> wrote:

I've also added a unit testhttps://github.com/michellab/BioSimSpace/blob/devel/test/Process/test_single_point_energy.py which you can run locally, e.g. for my development environment:

PYTHONPATH=/home/lester/Code/BioSimSpace/python ~/sire.app/bin/pytest -vvv test/Process/test_single_point_energy.py

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/152#issuecomment-640561102, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACZN3ZAAESAOLATE226WOD3RVTHUBANCNFSM4NBR5IHA.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

SofiaBariami commented 4 years ago

Thank you very much @lohedges! Now the energies coming both from the amber and the gromacs files agree for the protein as well:

lohedges commented 4 years ago

Great, thanks for confirming that all is okay. I'll close this now.