michellab / BioSimSpace_examples

GNU General Public License v2.0
0 stars 0 forks source link

FEPplus-BACE: Protein parameterisation error and getting output back #6

Open jmichel80 opened 6 years ago

jmichel80 commented 6 years ago

I have uploaded the following script to test a protein pdb parameterisation cd BioSimSpace_examples/FEPplus/bace/protein

julien@ubuntu:/mnt/vmshared/biosimspace/BioSimSpace_examples/FEPplus/bace/protein$ cat test_input.py import BioSimSpace as BSS protein = BSS.IO.readMolecules("protein_ori.pdb").getMolecules()[0] process = BSS.Parameters.parameterise(protein, "ff14SB") process.getOutput("param")

Parameterise fails, but process.getOutput does not return an output.

  2 protein = BSS.IO.readMolecules("protein_ori.pdb").getMolecules()[0]
  3 process = BSS.Parameters.parameterise(protein, "ff14SB")

----> 4 process.getOutput("param")

~/sire.app/lib/python3.5/site-packages/BioSimSpace-2018.1.1+360.g41af51d.dirty-py3.5.egg/BioSimSpace/Parameters/_process.py in getOutput(self, filename) 221 222 # Try to get the molecule. --> 223 self.getMolecule() 224 225 if self._zipfile is None or filename is not None:

~/sire.app/lib/python3.5/site-packages/BioSimSpace-2018.1.1+360.g41af51d.dirty-py3.5.egg/BioSimSpace/Parameters/_process.py in getMolecule(self) 181 self._is_error = True 182 raise _ParameterisationError("Parameterisation failed! " --> 183 + "Run 'getOutput() to see intermediate files.") 184 else: 185 # Fix the charges so that the total is integer values.

ParameterisationError: Parameterisation failed! Run 'getOutput() to see intermediate files.

Manually running again process.getOutput() does generate an output

In [2]: process.getOutput('param') Out[2]: 'param.zip'

Inspection of leap.log in param.zip shows the reason leap failed is that the input contained several water molecules

cat leap.log (..) FATAL: Atom .R<HOH 465>.A<H1 2> does not have a type. FATAL: Atom .R<HOH 465>.A<H2 3> does not have a type. FATAL: Atom .R<HOH 466>.A<O 1> does not have a type. FATAL: Atom .R<HOH 466>.A<H1 2> does not have a type. FATAL: Atom .R<HOH 466>.A<H2 3> does not have a type. FATAL: Atom .R<HOH 467>.A<O 1> does not have a type. FATAL: Atom .R<HOH 467>.A<H1 2> does not have a type. FATAL: Atom .R<HOH 467>.A<H2 3> does not have a type.

/home/julien/software/amber18/bin/teLeap: Fatal Error! Failed to generate parameters

Exiting LEaP: Errors = 1; Warnings = 24; Notes = 0.

The input contains 6105 atoms

In [10]: protein Out[10]:

This is the entire contents of protein_ori.pdb

However protein_ori.pdb contains a protein, a residue called TLA, and a bunch of crystallographic water molecules.

The structure of the pdb file is:

ATOM 6029 HG21 ILE A 447 0.587 -17.542 7.516 1.00 0.00 H
ATOM 6030 HG22 ILE A 447 -0.732 -18.255 8.474 1.00 0.00 H
ATOM 6031 HG23 ILE A 447 -0.833 -16.559 7.943 1.00 0.00 H
ATOM 6032 HD11 ILE A 447 3.331 -15.054 8.313 1.00 0.00 H
ATOM 6033 HD12 ILE A 447 3.365 -16.738 8.889 1.00 0.00 H
ATOM 6034 HD13 ILE A 447 2.425 -16.312 7.439 1.00 0.00 H
TER 6035 ILE A 447 HETATM 6036 O1 TLA A 502 33.408 8.321 -8.242 1.00 42.59 O
HETATM 6037 O11 TLA A 502 34.880 8.942 -6.680 1.00 42.71 O1- HETATM 6038 C1 TLA A 502 33.668 8.691 -7.108 1.00 43.26 C
HETATM 6039 C2 TLA A 502 32.588 8.809 -6.036 1.00 43.72 C
HETATM 6040 O2 TLA A 502 31.305 8.554 -6.580 1.00 44.12 O
HETATM 6041 C3 TLA A 502 32.622 10.179 -5.349 1.00 46.85 C
HETATM 6042 O3 TLA A 502 32.448 11.219 -6.298 1.00 49.85 O
HETATM 6043 C4 TLA A 502 31.569 10.256 -4.245 1.00 45.53 C
HETATM 6044 O4 TLA A 502 30.725 11.146 -4.218 1.00 47.19 O
HETATM 6045 O41 TLA A 502 31.676 9.314 -3.334 1.00 37.80 O1- HETATM 6046 H2 TLA A 502 32.819 8.136 -5.211 1.00 0.00 H
HETATM 6047 HO TLA A 502 30.643 8.634 -5.889 1.00 0.00 H
HETATM 6048 H3 TLA A 502 33.556 10.285 -4.796 1.00 0.00 H
HETATM 6049 HO TLA A 502 31.563 11.171 -6.667 1.00 0.00 H
HETATM 6050 O HOH A 679 30.451 7.892 0.667 1.00 18.92 O
HETATM 6051 H1 HOH A 679 30.379 7.216 -0.067 1.00 0.00 H
HETATM 6052 H2 HOH A 679 30.009 7.537 1.491 1.00 0.00 H
HETATM 6053 O HOH A 721 29.381 6.846 -6.246 1.00 22.99 O
HETATM 6054 H1 HOH A 721 28.841 7.687 -6.268 1.00 0.00 H
HETATM 6055 H2 HOH A 721 28.965 6.166 -6.849 1.00 0.00 H
HETATM 6056 O HOH A 736 -7.096 -5.619 6.533 1.00 21.38 O
HETATM 6057 H1 HOH A 736 -6.747 -6.334 5.927 1.00 0.00 H
(...)

Is BSS.IO.readMolecules.getMolecules()[0] working as intended? The documentation below suggests that getMolecules() should return a list of molecules, and since we slice the array we should only work with the first molecule present in the pdb file after parsing.

In [14]: print (BSS.IO.getMolecules.doc) Return a list containing all of the molecules in the specified group.

       Keyword arguments
       -----------------

       group : str
           The name of the molecule group.

       Returns
       -------

       molecule : [ BioSimSpace._SireWrappers.Molecule ]
           The list of molecules in the group.

However the protein object contains clearly all atoms in the input pdb.

In [15]: protein Out[15]:

We could consider this case as an example of a user providing an incorrect input, since it contains a number of non-protein residues (crystallographic additive + water molecules).

jmichel80 commented 6 years ago

Update: I have uploaded to the repository

Additionally

residues 150 and 421 renamed from HIS to HID residue 423 renamed from HIS to HIP *ATOM 6025 HXT in ILE (C-terminal) renamed OXT and element changed from H to O because Amber doesn't know how parameterise a C-terminal aldehyde !

this generates a valid leap output

However the test script generates a Sire exception now

In [10]: protein = BSS.IO.readMolecules("protein_only.pdb").getMolecules()[0] ...:

In [11]: process = BSS.Parameters.parameterise(protein, "ff14SB")

In [12]: process.getOutput("param") Exception in thread Thread-251: Traceback (most recent call last): File "/home/julien/sire.app/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/home/julien/sire.app/lib/python3.5/threading.py", line 862, in run self._target(*self._args, **self._kwargs) File "/home/julien/sire.app/lib/python3.5/site-packages/BioSimSpace-2018.1.1+360.g41af51d.dirty-py3.5.egg/BioSimSpace/Parameters/Protocol/_protocol.py", line 220, in run new_mol._makeCompatibleWith(par_mol, map=self._map, overwrite=True, verbose=False) File "/home/julien/sire.app/lib/python3.5/site-packages/BioSimSpace-2018.1.1+360.g41af51d.dirty-py3.5.egg/BioSimSpace/_SireWrappers/_molecule.py", line 439, in _makeCompatibleWith edit_mol.setProperty(_map[prop], mol1.property(prop).makeCompatibleWith(mol0, matches)) StopIteration: Exception 'SireError::invalid_index' thrown by the thread 'master'. No item at index 6034. Index range is from -6034 to 6033. Thrown from FILE: /home/julien/software/devel/Sire/corelib/src/libs/SireID/index.cpp, LINE: 120, FUNCTION: void SireID::IndexBase::throwInvalidIndex(qint32) const Backtrace ( 0) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Qt/../../../../../lib/./libSireError.so.2018 ([0x7f03e650daf1] ++0x41) -- SireError::getBackTrace()

( 1) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Qt/../../../../../lib/./libSireError.so.2018 ([0x7f03e650a755] ++0xa5) -- SireError::exception::exception(QString, QString)

( 2) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Base/../../../../../lib/libSireID.so.2018 ([0x7f03da8a9c21] ++0x201) -- SireID::IndexBase::throwInvalidIndex(int) const

( 3) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Base/../../../../../lib/libSireID.so.2018 ([0x7f03da8aa773] ++0x13) -- SireID::IndexBase::map(int) const

( 4) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Mol/../../../../../lib/libSireMM.so.2018 ([0x7f03d6d21240] ++0x90) -- SireMM::FourAtomFunctions::set(SireMol::AtomIdx, SireMol::AtomIdx, SireMol::AtomIdx, SireMol::AtomIdx, SireCAS::Expression const&)

( 5) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Mol/../../../../../lib/libSireMM.so.2018 ([0x7f03d6d25910] ++0x300) -- SireMM::FourAtomFunctions::_pvt_makeCompatibleWith(SireMol::MoleculeInfoData const&, QHash<SireMol::AtomIdx, SireMol::AtomIdx> const&) const

( 6) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Mol/../../../../../lib/libSireMol.so.2018 ([0x7f03d65bfe61] ++0x21) -- SireMol::MolViewProperty::makeCompatibleWith(SireMol::MoleculeInfoData const&, QHash<SireMol::AtomIdx, SireMol::AtomIdx> const&) const

( 7) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Mol/../../../../../lib/libSireMol.so.2018 ([0x7f03d65bfef5] ++0x25) -- SireMol::MolViewProperty::makeCompatibleWith(SireMol::MoleculeView const&, QHash<SireMol::AtomIdx, SireMol::AtomIdx> const&) const

/home/julien/sire.app/lib/python3.5/site-packages/Sire/Mol/_Mol.so(+0x9f5427) [0x7f03d7bf1427] ( 9) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Qt/../../../../../lib/libboost_python.so ([0x7f03e75df235] ++0x305) -- boost::python::objects::function::call(_object, _object) const

/home/julien/sire.app/lib/python3.5/site-packages/Sire/Qt/../../../../../lib/libboost_python.so(+0x233a8) [0x7f03e75df3a8] ( 11) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Qt/../../../../../lib/libboost_python.so ([0x7f03e75d565b] ++0x6b) -- boost::python::detail::exception_handler::operator()(boost::function0 const&) const

/home/julien/sire.app/lib/python3.5/site-packages/Sire/Error/_Error.so(+0x2534) [0x7f03db69d534] ( 13) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Qt/../../../../../lib/libboost_python.so ([0x7f03e75d562a] ++0x3a) -- boost::python::detail::exception_handler::operator()(boost::function0 const&) const

/home/julien/sire.app/lib/python3.5/site-packages/Sire/Error/_Error.so(+0x24f2) [0x7f03db69d4f2] ( 15) /home/julien/sire.app/lib/python3.5/site-packages/Sire/Qt/../../../../../lib/libboost_python.so ([0x7f03e75d53ff] ++0x3f) -- boost::python::handle_exception_impl(boost::function0)

/home/julien/sire.app/lib/python3.5/site-packages/Sire/Qt/../../../../../lib/libboost_python.so(+0x21379) [0x7f03e75dd379] ( 17) /home/julien/sire.app/bin/python ([0x5562de5cd89a] ++0x3a) -- PyObject_Call

( 18) /home/julien/sire.app/bin/python ([0x5562de674041] ++0x4c71) -- PyEval_EvalFrameEx

( 19) /home/julien/sire.app/bin/python ([0x5562de674490] ++0x50c0) -- PyEval_EvalFrameEx

( 20) /home/julien/sire.app/bin/python ([0x5562de67a03d] ++0x20d) -- PyEval_EvalCodeEx

/home/julien/sire.app/bin/python(+0x1ae63f) [0x5562de67b63f] ( 22) /home/julien/sire.app/bin/python ([0x5562de5cd89a] ++0x3a) -- PyObject_Call

( 23) /home/julien/sire.app/bin/python ([0x5562de671bd8] ++0x2808) -- PyEval_EvalFrameEx

( 24) /home/julien/sire.app/bin/python ([0x5562de66fe90] ++0xac0) -- PyEval_EvalFrameEx

( 25) /home/julien/sire.app/bin/python ([0x5562de66fe90] ++0xac0) -- PyEval_EvalFrameEx

( 26) /home/julien/sire.app/bin/python ([0x5562de67a03d] ++0x20d) -- PyEval_EvalCodeEx

/home/julien/sire.app/bin/python(+0x1ae464) [0x5562de67b464] ( 28) /home/julien/sire.app/bin/python ([0x5562de5cd89a] ++0x3a) -- PyObject_Call

/home/julien/sire.app/bin/python(+0x148e4f) [0x5562de615e4f] ( 30) /home/julien/sire.app/bin/python ([0x5562de5cd89a] ++0x3a) -- PyObject_Call

( 31) /home/julien/sire.app/bin/python ([0x5562de5cd991] ++0x31) -- PyEval_CallObjectWithKeywords

/home/julien/sire.app/bin/python(+0x1fc196) [0x5562de6c9196] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f03f431d6db] ( 34) /lib/x86_64-linux-gnu/libc.so.6 ([0x7f03f404688f] ++0x3f) -- clone

EndTrace Exception 'SireError::invalid_index' thrown by the thread 'master'. No item at index 6034. Index range is from -6034 to 6033. Thrown from FILE: /home/julien/software/devel/Sire/corelib/src/libs/SireID/index.cpp, LINE: 120, FUNCTION: void SireID::IndexBase::throwInvalidIndex(qint32) const

lohedges commented 6 years ago

For the first part: The reason why the system isn't broken into it's constituent molecules is because there are no TER records in the PDB file. At the moment we don't use the CONNECT records to break up the system (since it is often missing, or incomplete). The current solution is to insert appropriate TER records between molecules, or, as you have done, use a PDB file with a single molecule.

For the second bit: It looks like an incompatibility issue when mapping the properties from the parameterised molecule back into the the original one. It's possible that tLEaP has monkeyed with the atom order. I'll try to debug this further and post an update.

lohedges commented 6 years ago

The problem is that tLEaP has added additional atoms to the molecule, so the layout no longer matches that of the original molecule. It's possible that the parameterisation was okay, and we just need to copy across the appropriate properties for the atoms that match those in the original molecule. At the moment things are failing because the makeCompatibleWith property methods in Sire won't work if the molecular layout is inconsistent.

To see what's going wrong, load IPython and run:

import BioSimSpace as BSS

# Load the protein and grab the first molecule.
protein = BSS.IO.readMolecules("protein_only.pdb").getMolecules()[0]

# Display the molecule's info.
protein
<BioSimSpace.Molecule: nAtoms=6034, nResidues=390>

# Create a parameterisation process running in "tmp".
process = BSS.Parameters.parameterise(protein, "ff14SB", work_dir="tmp")

# Try to get the parameterised molecule (this will raise an Exception).
par_protein = process.getMolecule()

# Now read the parameterised molecule from file.
par_protein = BSS.IO.readMolecules(["tmp/leap.top", "tmp/leap.crd"]).getMolecules()[0]

# Display the new molecule's info.
par_protein
<BioSimSpace.Molecule: nAtoms=6041, nResidues=390>

As you can see, tLEaP has added 7 atoms!

lohedges commented 6 years ago

To see the atoms that have been added by tLEaP, just copy the following below the code above:

from Sire.Mol import AtomIdx

# Extract the coordinates from the original and parameterised molecule.
c0 = protein._sire_molecule.property("coordinates").toVector()
c1 = par_protein._sire_molecule.property("coordinates").toVector()

# Find the coordinates from c1 that aren't in c0 and print the corresponding atom.
for idx, c in enumerate(c1):
    if c not in c0:
        print(par_protein._sire_molecule.atom(AtomIdx(idx)))

This gives:

Atom( H3 : 4 )
Atom( HG : 2444 )
Atom( HG : 3405 )
Atom( HG : 4225 )
Atom( HG : 5019 )
Atom( HG : 5633 )
Atom( HG : 5977 )

It looks like tLEaP has added some extra hydrogens. As well as this, the atom ordering has changed (this can be seen by writing the par_protein to PDB and comparing with the original) but this is handled correctly by BioSimSpace, i.e. it matches the correct atoms because the ResIdx and coordinates of the atoms are unchanged.