michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Failed to read molecules from AMBER files #137

Open SofiaBariami opened 4 years ago

SofiaBariami commented 4 years ago

Hello, I am trying to use combine.py to a system of a protein and a ligand, but I am having this error:

OSError: Failed to read molecules from: ['QUBE_pro.prm7', 'QUBE_pro.rst7']

I got the prm, rst files by using the followin script, that reads pdb/xml files and prints prm/rst ones:

from simtk.openmm.app import * 
from simtk.openmm import * 
from simtk.unit import * 
from sys import stdout 
import parmed as pmd
from parmed.openmm import XmlFile

pdbfile = PDBFile('QUBE_pro.pdb')
ff = ForceField("QUBE_pro.xml")
system = ff.createSystem( pdbfile.topology)
#Swap in positions
integrator = openmm.VerletIntegrator(1.0 )
context = openmm.Context(system, integrator)
context.setPositions(pdbfile.positions)
# Create parmed Structure
structure = parmed.openmm.topsystem.load_topology( pdbfile.topology, system, pdbfile.positions)
# Write AMBER parameter/crd
structure.save('system.prmtop', overwrite=True)
structure.save('system.crd', format='rst7', overwrite=True)

I couldn't upload the files on GitHub, so you can find them through onedrive with the following link: https://uoe-my.sharepoint.com/:f:/g/personal/s1786388_ed_ac_uk/EubzxglqUZVFvlioJKhJD3UBYhy0428TuMRRBiqcVsAcjQ?e=E8ysnb

Have you seen this problem before, or do you have any ideas on how to solve it? Thanks a lot!

lohedges commented 4 years ago

Hi @SofiaBariami, I was online and happened to have a spare few minutes so took a quick look at this. For reference, this is how you can see the full Sire error trace in BioSimSpace:

import BioSimSpace as BSS

# Turn on verbose error messages.
BSS.setVerbose(True)

# Try to read the molecules.
s = BSS.IO.readMolecules(["Qube_pro.rst7", "Qube_pro.prm7"])

This gives the following error:


UserWarning: Exception 'SireIO::parse_error' thrown by the thread 'master'.
Atoms in residue 1 are not listed as part of the expected molecule 1. Should be atoms 1 to 6107, but molecule has [ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515 ].
Thrown from FILE: /Users/runner/runners/2.160.0/work/1/s/corelib/src/libs/SireIO/amberprm.cpp, LINE: 3783, FUNCTION: SireMol::MolStructureEditor SireIO::AmberPrm::getMolStructure(int, const SireBase::PropertyName &) const
__Backtrace__
(  0) libSireError.2019.2.1.dylib (0x000000010a0f515b +59)
  -- SireError::getBackTrace()

It looks like there is an inconsistency with the information in the files. Can Parmed read the files back in successfully?

SofiaBariami commented 4 years ago

Hi Lester, thanks for looking at it. By taking a closer look to the protein files, I saw that all the residues of the protein are called "QUP", so probably this is what is wrong. Both the pdb and the prm files are like that. Is there a way to overcome this problem using the property map or I should find the file with the correct residues?

SofiaBariami commented 4 years ago

For the record, parmed is able to load back the prm7 files as a system made of a single residue

In [1]: import parmed as pmd                                                                                                                                            
In [2]: pmd.load_file("QUBE_pro.prm7", "QUBE_pro.rst7")                                                                                                                 
Out[2]: <AmberParm 6107 atoms; 1 residues; 6192 bonds; parametrized>
In [3]:  
jmichel80 commented 4 years ago

A quick update. The issue was that the input pdb actually contains several molecules, so the Sire parser presumably bails out because the connectivity appears incomplete. parmed doesn't care about these sanity checks and proceeds. @SofiaBariami is testing a workaround.

lohedges commented 4 years ago

I didn't try reading the PDB file that Sofia uploaded, the issue was with the prm7/rst7 files generated by Parmed.

As mentioned previously, we don't rely on CONECT records since they are rarely present and often incomplete. If you want to represent a multi molecule system with a PDB file then place single TER records between the molecules. If these are present Sire will split the system into separate molecules. If not, you'll just end up with a single molecule.

Cheers

On Tue, 3 Dec 2019, 13:07 Julien Michel, notifications@github.com wrote:

A quick update. The issue was that the input pdb actually contains several molecules, so the Sire parser presumably bails out because the connectivity appears incomplete. parmed doesn't care about these sanity checks and proceeds. @SofiaBariami https://github.com/SofiaBariami is testing a workaround.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/michellab/BioSimSpace/issues/137?email_source=notifications&email_token=AAE6K3L6HKS2KGVR3L5XRR3QWZK2TA5CNFSM4JUCHQA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFZJWLY#issuecomment-561158959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3IAEAECNYRWLQDZF3DQWZK2TANCNFSM4JUCHQAQ .

jmichel80 commented 4 years ago

Good point, @SofiaBariami another workaround may be to split the pdb file with TER statements, that may give you a prm7 file that combine.py can read.

SofiaBariami commented 4 years ago

I did that but I get this Value error: ValueError: No template found for residue 1 (QUP). This might mean your input topology is missing some atoms or bonds, or possibly that you are using the wrong force field This is probably because it is assumed that QUP residue has 6107 atoms, but there are only 515 before the first TER

SofiaBariami commented 4 years ago

Hello, an update on this issue:

I tried to debug the simtk code that is found at forcefield.py, and I found that the program doesn't read all the bond entries of the second fragment, even if the atom names and types are all read successfully. More specifically, the output that I am getting is the following:

...
atomname =  H5024
atomtype =  QUBE_5023
atomname =  H5025
atomtype =  QUBE_5024
atomname =  H5026
atomtype =  QUBE_5025
from, to = 519 516
from, to = 519 520
from, to = 516 515
from, to = 516 517
from, to = 516 518
.
.
.
from, to = 4483 4485
from, to = 4477 4476
from, to = 4489 4488
from, to = 4490 4488
from, to = 4491 4493

It stops at the bond between atoms 4491 and 4493. The next bond-pair is <Bond from="4513" to="4495"/> . I checked the pdb file (with VMD as well), and the specific connection exists. Also, at the XML file, we can find parameters about this bond: <Bond class1="C4513" class2="C4495" k="166657.088" length=".1523"/>, so there isn't a typo. Any ideas on that? Here you can find the pdb and xml files. Thanks a lot!