michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

ff14SB parameterisation failure #161

Open JenkeScheen opened 3 years ago

JenkeScheen commented 3 years ago

Potentially related to #158

I've been trying to parameterise a protein ( bace_prep_nowats.pdb.tar.gz ) without much luck using the following lines:

protein = BSS.IO.readMolecules("./bace_prep_nowats.pdb")[0]
protein_par = BSS.Parameters.ff14SB(protein, work_dir="output")
protein_par.getMolecule()

throws a parameterisationError on .getMolecule(): ParameterisationError: Parameterisation failed! Last error: 'The passed molecule is incompatible with the original! self.nAtoms() = 6039, other.nAtoms() = 6040'

Although there's a fair chance this is an issue with the input file, I thought I would raise an issue as solving this isn't as straightforward as it could be. Furthermore, this input file works fine when parameterising using fesetup. File was obtained by removing waters from bace_prep.pdb.tar.gz which is prepped by flare (V2 I believe).

tleap.out:

-I: Adding /home/jscheen/amber18/dat/leap/prep to search path. -I: Adding /home/jscheen/amber18/dat/leap/lib to search path. -I: Adding /home/jscheen/amber18/dat/leap/parm to search path. -I: Adding /home/jscheen/amber18/dat/leap/cmd to search path. -f: Source leap.txt.

Welcome to LEaP! (no leaprc in search path) Sourcing: ./leap.txt ----- Source: /home/jscheen/amber18/dat/leap/cmd/leaprc.protein.ff14SB ----- Source of /home/jscheen/amber18/dat/leap/cmd/leaprc.protein.ff14SB done Log file: ./leap.log Loading parameters: /home/jscheen/amber18/dat/leap/parm/parm10.dat Reading title: PARM99 + frcmod.ff99SB + frcmod.parmbsc0 + OL3 for RNA Loading parameters: /home/jscheen/amber18/dat/leap/parm/frcmod.ff14SB Reading force field modification type file (frcmod) Reading title: ff14SB protein backbone and sidechain parameters Loading library: /home/jscheen/amber18/dat/leap/lib/amino12.lib Loading library: /home/jscheen/amber18/dat/leap/lib/aminoct12.lib Loading library: /home/jscheen/amber18/dat/leap/lib/aminont12.lib Loading PDB file: ./leap.pdb Added missing heavy atom: .R<CILE 390>.A<OXT 20> total atoms in file: 6034 Leap added 1 missing atom according to residue templates: 1 Heavy Checking Unit.

/home/jscheen/amber18/bin/teLeap: Warning! The unperturbed charge of the unit (-10.000000) is not zero.

/home/jscheen/amber18/bin/teLeap: Note. Ignoring the warning from Unit Checking.

Building topology. Building atom parameters. Building bond parameters. Building angle parameters. Building proper torsion parameters. Building improper torsion parameters. total 1264 improper torsions applied Building H-Bond parameters. Incorporating Non-Bonded adjustments. Not Marking per-residue atom chain types. Marking per-residue atom chain types. (Residues lacking connect0/connect1 - these don't have chain types marked:

res total affected

CILE    1
NGLY    1

) (no restraints) Quit

Exiting LEaP: Errors = 0; Warnings = 1; Notes = 1.

lohedges commented 3 years ago

H there. Your issue comes at a good time, and is indeed related to pdb4amber and the way that tLEaP edits the molecule during parameterisation. From the output you can see that tLEaP has reported a missing heavy atom and has added an OXT to the molecule. To see this, you can run:

import BioSimSpace as BSS

# Load the molecule.
molecule = BSS.IO.readPDB("bace_prep_nowats.pdb")[0]

# Try to parameterise it.
par_mol =  BSS.Parameters.ff14SB(molecule, work_dir="tmp").getMolecule()

# Load the system generated by tLEaP and get the first molecule.
tleap_mol = BSS.IO.readMolecules(["tmp/leap.top", "tmp/leap.crd"])[0]

# Print the number of atoms in each molecule.
print(molecule.nAtoms(), tleap_mol.nAtoms())
6039 6040

# Print the last atom in each molecule.
print(molecule.getAtoms()[-1])
print(tleap_mol.getAtoms()[-1])
<BioSimSpace.Atom: name='HA', molecule=2 index=6038>
<BioSimSpace.Atom: name='OXT', molecule=6 index=6039>

The reason that things work with FESetup is that it doesn't care what is returned by tLEaP during parameterisation, i.e. it just loads the new system and uses that. In BioSimSpace we take a different approach and assume that the original file loaded by the user is the reference topology that is used throughout. After all subsequent operations we make sure that the output is consistent with this reference, i.e. we map back coordinates, forcefield parameters, etc., into the original system. This is failing since the output of tLEaP has one more atom than the original.

With the updates described in #158, we have now added support for the pdb4amber tool. This fixes up a lot of common issues with PDB files meaning that they can be correctly processed by the AmberTools suite. To run your PDB through this tool you can use:

import BioSimSpace as BSS
molecule = BSS.IO.readPDB("bace_prep_nowats.pdb", pdb4amber=True)[0]

Note that this pre-processing is applied on the initial read so that the user is immediately aware of any modifications to the file, i.e. they can load with-and-without the option and check the difference. We could pre-process behind the scenes during parameterisation, but this would change things behind the user's back.

However, doing this doesn't fix your issue...

print(molecule.getAtoms()[-1])
<BioSimSpace.Atom: name='HA', molecule=2 index=6038>

This is because we currently don't support the --add-missing-atoms flag with pdb4amber, since it is experimental. If appropriate, it might be possible to handle extra flags with some kind of optargs option to readPDB. @jmichel80 and @chryswoods, what do you think about this? Annoyingly the arguments have quite different formatting, or need to be combined, so this isn't the easiest thing to do. Perhaps we could just support a subset of them. (I haven't added any until now, since the defaults worked for our other collaborators use cases.)

jmichel80 commented 3 years ago

hi @JenkeScheen can you export a pdb with all atoms added from Flare and use this onward ? We are not quite ready to deal with various non standard ways third party tools add missing atoms. Check what is the latest parameterisation workflow used in Flare and ask Mark how best to replicate this for your project.

lohedges commented 3 years ago

This could also be a case of adding a clearer error message that states some possible solutions, e.g. adding the missing atoms yourself, or running pdb4amber manually. This might be a useful placeholder until we figure out how to best incorporate extra functionality.