Closed mattaq31 closed 2 years ago
Looking at the supplementary files in jp6b03296_si_002.zip I think this is because you are reading AMBER format topologies that are intended to be used for RBFE simulations using SOMD. As such, these will contain massed dummy atoms that are part of the alchemical transformation. Atoms of this type are not allowed by GROMACS for regular (non-FEP) simulations, hence the error.
If you want to reproduce the simulations using GROMACS, then you will need the original input topologies for the ligands and protein and the mapping that was used for the perturbation, i.e. the files that were used to create the input for SOMD, rather than the input to SOMD itself. With this you will be able to construct GROMACS format FEP topologies that should work as expected.
(The link that you provided referenced supporting information rather than supplementary files and I didn't see anything with a file path that matches the one used in your code snippet within the archive.)
Thanks for the clear issue and additional context information. Much appreciated.
Although it is odd that the warnings are for protein atoms. We did this work several years ago, I would advise setting up the systems from scratch if you want to replicate the findings of the study. You could extract the protein & ligand coordinates so you have a full all-atom model and use BioSimSpace to parameterise, solvate, equilibrate and generate FEP inputs.
Yes, I agree that it's confusing. This could be due to loading all the files in the directory, i.e. the PDB, CRD, and top, so information from the PDB file may have overwritten values from the TOP file once the system was set up. Sire simply works out which files can be used to generate the molecular system (lead parse) then adds any extra information from the other parsers (followers) as long as the topology is consistent.
Thanks both for your replies.
I see, then it might be best to simply begin from the protein and ligand positions as you've said then solvate from scratch.
Re. file path, apologies for not being more specific, the set of files I was referring to was:
URL: https://pubs.acs.org/doi/suppl/10.1021/acs.jpcb.6b03296/suppl_file/jp6b03296_si_002.zip
Internal path: Systems/3Series/3K~3B/Complex/core
Thanks again!
All the above makes sense. Maybe using this Thrombin dataset might be easier: https://github.com/openforcefield/protein-ligand-benchmark/tree/master/data/2019-09-23_thrombin
I think that dataset is different from the one we studied with Gaetano, but it wouldn't be too hard to add more ligands if you want to study that particular series. Note that there is a flexible loop with a Tryptophan residue near the binding site that was annoying to model at least when we looked at it.
Hi all, not sure if I'm doing something wrong, but I tried to parametrise the thrombin files using BioSimSpace, with no luck:
BSS.IO.readMolecules
BSS.Parameters.ff99SB(system[0]).getMolecule()
- I use system[0]
as I don't think I can parametrise the whole complex at once (the parametriser seems to only take in one molecule at a time)
'BioSimSpace._Exceptions._exceptions.ParameterisationError: Parameterisation failed! Last error: 'tLEaP failed!'
water_model="tip3p"
which I got from here, but this did not seem to change anything.pdb2gmx
but I also get a different error: Residue 'LIG' not found in residue topology database
Thanks for your help and apologies for the confusion! I've uploaded the thrombin + ligand file here: base_protein_ligand.zip
Hi there,
Thanks for reporting. In order to extract the parameterisation errors you will need to execute the parameterisation in a specified working directory. All intermediate files will be written here, along with a README containing the list of commands that were run in order. For example:
import BioSimSpace as BSS
# Load your combined protein + ligand PDB file.
system = BSS.IO.readMolecules("base_protein_ligand.pdb")
# Extract the constituents for convenience.
protein = system[0]
ligand = system[1]
# Try to parameterise the protein.
protein = BSS.Parameters.ff99SB(protein, work_dir="protein_param").getMolecule()
This errors with:
ParameterisationError: Parameterisation failed! Last error: 'tLEaP failed!'
We can now look in the protein_param
directory:
ls protein_param
leap.err leap.log leap.out leap.pdb leap.top leap.txt README.txt
The README.txt
has the command that was run:
cat protein_param/README.txt
# tLEaP was run with the following command:
/home/lester/sire.app/bin/tleap -f leap.txt
Looking at the tleap
log file:
cat protein_param/leap.log
...
/home/lester/sire.app/bin/teLeap: Warning!
The unperturbed charge of the unit (6.000000) is not zero.
FATAL: Atom .R<LIG 1>.A<C1 1> does not have a type.
FATAL: Atom .R<LIG 1>.A<N1 2> does not have a type.
FATAL: Atom .R<LIG 1>.A<O1 3> does not have a type.
FATAL: Atom .R<LIG 1>.A<C2 4> does not have a type.
FATAL: Atom .R<LIG 1>.A<N2 5> does not have a type.
FATAL: Atom .R<LIG 1>.A<O2 6> does not have a type.
FATAL: Atom .R<LIG 1>.A<C3 7> does not have a type.
FATAL: Atom .R<LIG 1>.A<C4 8> does not have a type.
FATAL: Atom .R<LIG 1>.A<C5 9> does not have a type.
FATAL: Atom .R<LIG 1>.A<C6 10> does not have a type.
FATAL: Atom .R<LIG 1>.A<C7 11> does not have a type.
FATAL: Atom .R<LIG 1>.A<C8 12> does not have a type.
FATAL: Atom .R<LIG 1>.A<C9 13> does not have a type.
FATAL: Atom .R<LIG 1>.A<C10 14> does not have a type.
FATAL: Atom .R<LIG 1>.A<C11 15> does not have a type.
FATAL: Atom .R<LIG 1>.A<C12 16> does not have a type.
FATAL: Atom .R<LIG 1>.A<C13 17> does not have a type.
FATAL: Atom .R<LIG 1>.A<C14 18> does not have a type.
FATAL: Atom .R<LIG 1>.A<C15 19> does not have a type.
FATAL: Atom .R<LIG 1>.A<C16 20> does not have a type.
FATAL: Atom .R<LIG 1>.A<C17 21> does not have a type.
FATAL: Atom .R<LIG 1>.A<C18 22> does not have a type.
FATAL: Atom .R<LIG 1>.A<C19 23> does not have a type.
FATAL: Atom .R<LIG 1>.A<C20 24> does not have a type.
FATAL: Atom .R<LIG 1>.A<C21 25> does not have a type.
FATAL: Atom .R<LIG 1>.A<CL1 26> does not have a type.
FATAL: Atom .R<LIG 1>.A<H1 27> does not have a type.
FATAL: Atom .R<LIG 1>.A<H2 28> does not have a type.
FATAL: Atom .R<LIG 1>.A<H3 29> does not have a type.
FATAL: Atom .R<LIG 1>.A<H4 30> does not have a type.
FATAL: Atom .R<LIG 1>.A<H5 31> does not have a type.
FATAL: Atom .R<LIG 1>.A<H6 32> does not have a type.
FATAL: Atom .R<LIG 1>.A<H7 33> does not have a type.
FATAL: Atom .R<LIG 1>.A<H8 34> does not have a type.
FATAL: Atom .R<LIG 1>.A<H9 35> does not have a type.
FATAL: Atom .R<LIG 1>.A<H10 36> does not have a type.
FATAL: Atom .R<LIG 1>.A<H11 37> does not have a type.
FATAL: Atom .R<LIG 1>.A<H12 38> does not have a type.
FATAL: Atom .R<LIG 1>.A<H13 39> does not have a type.
FATAL: Atom .R<LIG 1>.A<H14 40> does not have a type.
FATAL: Atom .R<LIG 1>.A<H15 41> does not have a type.
FATAL: Atom .R<LIG 1>.A<H16 42> does not have a type.
FATAL: Atom .R<LIG 1>.A<H17 43> does not have a type.
FATAL: Atom .R<LIG 1>.A<H18 44> does not have a type.
FATAL: Atom .R<LIG 1>.A<H19 45> does not have a type.
FATAL: Atom .R<LIG 1>.A<H20 46> does not have a type.
FATAL: Atom .R<LIG 1>.A<H21 47> does not have a type.
FATAL: Atom .R<LIG 1>.A<H22 48> does not have a type.
FATAL: Atom .R<LIG 1>.A<H23 49> does not have a type.
FATAL: Atom .R<ILE 2>.A<H1 20> does not have a type.
FATAL: Atom .R<ILE 2>.A<H2 21> does not have a type.
FATAL: Atom .R<ILE 2>.A<H3 22> does not have a type.
/home/lester/sire.app/bin/teLeap: Fatal Error!
Failed to generate parameters
Exiting LEaP: Errors = 1; Warnings = 4; Notes = 0.
This is because your PDB file contains atom names that don't match the required tleap
templates. You will need to pre-process the file or rename them manually.
Similar errors occur when trying to parameterise the ligand.
We have crude support for pre-processing PDB files with the pd4amber
tool prior to reading, but this still requires somewhat standard formatting. If you split your PDB into separate protein and ligand files you could try doing:
protein = BSS.IO.readPDB("protein.pdb", pdb4amber=True)
This might clean up some of the janky formatting. Alternatively, you could run the tool yourself from the command-line so you can access all of its options.
The reason we don't try too hard to clean files ourselves is that it requires a lot of system specific knowledge and it is very hard to generalise. We don't want to make invalid assumptions, so assume that the user best knows what their files should represent.
I hope this helps.
Thanks @lohedges for the detailed reply!
That debugging step with 'workdir' worked great, I now have a much better view at what's going on. I'll try the pdb4amber tool that you've described and see how I get on.
There are a couple of things I am still not 100% on:
protein = system[0]
) actually splits off a chunk of the protein, along with the ligand itself (as attached screenshot). How would you coerce BSS to be able to split the system properly?Thanks again for your help!
No problem. It's one of things that's hard to design perfectly with BioSImSpace, i.e. keeping things as simple as possible while still allowing users to access the full set of intermediate files and commands for debugging. It's also hard to give good generic high-level error messages since they are so varied and often require a lot of technical understanding to parse.
Typically, PDB files should only be used for single molecule topologies, i.e. as in the original spec. However, many people combine molecules with a loose standard being a single standalone TER
separator between molecules. (This is often used for waters, for example.) This is how we split multi-molecule PDB files in BioSimSpace. Note that TER
records normally are used to indicate the end of molecular chains, but these have slightly different formatting, i.e. not just TER
by itself. BioSImSpace will still treat these records correctly. If you format your PDB file in this way, then BioSImSpace should split it as you intend. I generally recommend keeping separate PDB files for each molecule that you are interested in. It's usually more flexible and it's easier to share with coleagues, etc., since that's what the PDB is designed for.
If the parametrization works well, how would you recombine the protein and ligand together again to create a single system for solvation by BSS?
BioSimSpace has powerful functionality for combining molecules in a Pythonic way. For example, once you have parameterised protein
and ligand
molecules you can just do:
molecules = protein + ligand
This will give you a Molecules
container, which can be converted to a System
object by using the .toSystem()
method, i.e.
complex = molecules.toSystem()
# Alternatively, in one go.
complex = (protein + ligand).toSystem()
(Most functions in BioSimSpace can handle Molecules
in place of a System
, with the conversion taking place internally.)
For reference, PDB is one of the oldest and most abused file formats, so many input files are poorly formatted or have lots of quirks and inconsistencies. Despite it's relative simplicity, it's quite hard to parse reliably because of this. (Well, parsing is okay, but actually inferring what the user intended isn't.)
Hi @lohedges just wanted to thank you again for your help here - your explanations have been very helpful for me to understand what's going on in BSS under the hood.
Given that the problems I've had are more related to issues with the input files rather than BSS itself, can this Github issue now be closed?
No problem, I'm glad the information was of help. As you have found, a lot of stuff is going on behind the scenes. Once you get used to things you can normally dig down to get a better insight as to what might have gone wrong.
Yes, please feel free to close if you are satisfied that things are resolved.
Cheers.
Hi,
I'm attempting to use BioSimSpace to simulate an amber system which I obtained from this paper (supplementary files -> ligand 3K, complex) using GROMACS. To do this, I ran the following set of commands:
When I attempt to create the process (last line above), the system errors out with the following output:
I have also tried other ligands in the paper, which resulted in similar errors as above. Outputting the simulation files directly using
BSS.IO.saveMolecules("thrombin_system", system, ["Gro87", "GroTop"])
and subsequently attempting to rungrompp
from command line also results in the same errors.Is this a bug or is there something wrong with my procedure/setup? Thanks for your help!
Context