Closed kexul closed 3 years ago
Hi, the code works for me in version 1.2.0, and I also have 8 GB RAM. That being said, it does take a long time and a lot of memory due to the structural differences between the reference crystal ligand and the ligands of interest. This means that the maximum common substructure algorithm needs a lot of computational power to come up with structures that might be obvious to us, or even worse, find another solution to the problem that we are not interested in. That's why I would highly recommend engineering a custom reference ligand (with e.g. Avogadro) which is more similar to your ligands of interest and the code shouldn't struggle in this case.
Otherwise, I suspect that I could optimise the memory consumption and/or the speed of the code in the case of a relatively small common substructure but that part of the code is very complicated and I don't know when I will get the time to do that, and I am also conscious about introducing new bugs to that part of the code.
Hi @msuruzhon , how about use a core-constrained docking here. In Merck KGaa's paper , they switched from flexible ligand alignment to core-constrained docking in most cases. I've found a nice implement by rdkit and rdock.
Here is my alignment result generate by protocaller 1.2.0. Should it be considered as a good alignment?
By the way , this is the result of older version of protocaller.
Here are the .gro files. morph.zip
This alignment is an artifact of the current MCS algorithm not handling the (known) case where the stereochemical label of an atom changes between molecules, resulting in a diminished MCS, and therefore bad alignment. It seems that this case should be handled properly, but it will take some time for me to implement it. Until then, a practical solution for that is to create a perturbation "manually" by discarding stereochemical mapping, e.g.:
from ProtoCaller.Ensemble import Perturbation pert = Perturbation(ligands_proto[0], ligands_proto[1]) pert.alignToReference(protein_proto.ligand_ref) pert.alignToEachOther(mcs_parameters=dict(keep_stereo=False)) morphs = [pert]
and then pass the morphs to the Ensemble
constructor. This should hopefully work for now until I implement a fix at some point.
As for docking, this is certainly an interesting functionality, but it's unfortunately not something I can devote much attention to. If you want to experiment with it, you can certainly extract the parametrised complex_template
object from the parametrised protein and save it as a pdb, which you can then use for docking. Afterwards, you can use the resulting ligand coordinates to create the Ligand
object, and use the ligand as its own reference, so that no alignment is actually done. I hope this makes sense somewhat. If you manage to figure out how to do a nice workflow, you are welcome to create a pull request for me to review in the future.
I'm glad to implement a docking based workflow, I'll create a pull request as soon as possible.
By the way, the code to disable stereo mapping raise an error here:
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:root:Downloading ligand files from the Protein Data Bank...
INFO:root:Parametrising ligand ligand0...
WARNING:root:Cannot parametrise unprotonated ligand. Protonating first with default parameters...
INFO:root:Running OpenBabel...
INFO:root:Running OpenBabel...
INFO:root:Running antechamber...
INFO:root:Running parmchk2...
INFO:root:Running tleap...
INFO:root:Parametrising ligand ligand1...
WARNING:root:Cannot parametrise unprotonated ligand. Protonating first with default parameters...
INFO:root:Running OpenBabel...
INFO:root:Running OpenBabel...
INFO:root:Running antechamber...
INFO:root:Running parmchk2...
INFO:root:Running tleap...
INFO:root:Running PDB2PQR...
INFO:root:Running OpenBabel...
INFO:root:Parametrising original crystal system...
INFO:root:Running tleap...
INFO:root:Running tleap...
INFO:root:Creating morph ligand0~ligand1...
INFO:root:Morph ligand0~ligand1 is already aligned to a reference
INFO:root:Morph ligand0~ligand1 is already aligned to a reference
Traceback (most recent call last):
File "prepare.py", line 49, in <module>
prepare_system(ligands)
File "prepare.py", line 43, in prepare_system
system.prepareComplexes()
File "/root/miniconda3/envs/uii/lib/python3.7/site-packages/ProtoCaller/Ensemble/__init__.py", line 203, in prepareComplexes
morph_BSS, mcs = morph.alignAndCreateMorph(self.protein.ligand_ref)
TypeError: cannot unpack non-iterable NoneType object
Here is my code:
import logging
import os
logging.basicConfig(level=logging.INFO)
from ProtoCaller.Utils.fileio import Dir
from ProtoCaller.Ensemble import Ligand, Protein, Ensemble, Perturbation
def prepare_system(ligands) -> None:
"""
pdb_file: cocrystal file name
ligands: list of ligands file name
"""
with Dir('SYSTEM', overwrite=True):
# create a protein from its PDB code and the residue number of the ligand
# we are going to use for mapping
#lig = Ligand('/proto/ref.pdb', minimise=False, protonated=True)
protein_proto = Protein('3eyg', ligand_ref='1')
ligands_proto = []
for idx, item in enumerate(ligands):
li = Ligand(item, protonated=False, minimise=False, name=f"ligand{idx}", workdir="Ligands")
ligands_proto.append(li)
# create the morphs from the ligands
pert = Perturbation(ligands_proto[0], ligands_proto[1])
pert.alignToReference(protein_proto.ligand_ref)
pert.alignToEachOther(mcs_parameters=dict(keep_stereo=False))
morphs = [pert]
# create a system from the protein and the morphs and set up some default
# settings regarding system preparation
system = Ensemble("GROMACS", protein=protein_proto, morphs=morphs,
box_length_complex=7, ligand_ff="gaff2",
workdir=protein_proto.workdir.path)
# only keep the reference ligand and keep all crystallographic waters
system.protein.filter(ligands=None, waters="all")
# protonate the protein using PDB2PQR and add missing atoms / residues if
# needed
system.protein.prepare()
# prepare the complex and solvated leg starting structures and save them as
# GROMACS input
system.prepareComplexes()
if __name__ == '__main__':
ligands = ['/proto/mola.pdb',
'/proto/molb.pdb']
prepare_system(ligands)
This error is due to some artifacts in the code back when I was refactoring that class. I have created some new commits which you can download from the essexlab/label/dev
conda
channel under the developmental 1.2.1
version. That being said, I have found another issue with the alignment, so the best bet until i fix it for this case is to turn off the recursive bond breaking algorithm (which is supposed to grow the MCS structure to a bigger physical one) during the first part of the alignment, namely:
from ProtoCaller.Ensemble import Perturbation pert = Perturbation(ligands_proto[0], ligands_proto[1]) pert.alignToReference(protein_proto.ligand_ref, mcs_parameters=dict(break_recursively=False)) pert.alignToEachOther(mcs_parameters=dict(keep_stereo=False)) morphs = [pert]
This molecule seems to present a lot of problems for the software, so this makes it a very useful test case when the bugs are finally addressed. I will probably add it to the unit tests.
Hi, I updated my protocaller and used the code you provided, the alignment seems perfect now! Thank you so much!
But, it seems like, some abnormal bonds showed in the ligand in morph.gro
and complex_final.gro
.
For example: atom 46
and 13
; atom 46
and 14
.
The following mdrun
failed with LINCS warning complaining about atom distance.
Here is my output files. 3eyg.zip
Is it reasonable? Is there any clue to diagnose the morph.gro
and complex_final.gro
? Since the alignment file was temporary file that were removed in the output folder in the normal pipeline.
This is strange, because I get the following alignment when I run it:
These are the output files + the input script: Files.zip
Maybe you can find the differences between your script and mine? Otherwise this doesn't make any sense.
Also, when you call the alignment method manually, it should create aligned PDB and INPCRD files in the folder you are calling it from, so you can visualise them.
Hi, @msuruzhon , I've checked the difference between our results. The alignment result of ligand1
, i.e. molb
was similar.
But, in my result, atom 30
was close to atom 9
and atom 42
, their distances were 0.9 angstrom. In your result, the same residual was close to the heterocycle, but the closest distance, i.e. the distance between atom 31
and atom 10
was 1.4 angstrom.
Here is my result of alignment. alignment.zip
Most of our code are identical, except you parametise the ligands before createing the system. Here is my code used to generate the system:
import logging
import os
logging.basicConfig(level=logging.INFO)
from ProtoCaller.Utils.fileio import Dir
from ProtoCaller.Ensemble import Ligand, Protein, Ensemble, Perturbation
def prepare_system(ligands) -> None:
"""
pdb_file: cocrystal file name
ligands: list of ligands file name
"""
with Dir('SYSTEM', overwrite=True):
# create a protein from its PDB code and the residue number of the ligand
# we are going to use for mapping
#lig = Ligand('/proto/ref.pdb', minimise=False, protonated=True)
protein_proto = Protein('3eyg', ligand_ref='1')
ligands_proto = []
for idx, item in enumerate(ligands):
li = Ligand(item, protonated=False, minimise=False, name=f"ligand{idx}", workdir="Ligands")
ligands_proto.append(li)
# create the morphs from the ligands
pert = Perturbation(ligands_proto[0], ligands_proto[1])
pert.alignToReference(protein_proto.ligand_ref, mcs_parameters=dict(break_recursively=False))
pert.alignToEachOther(mcs_parameters=dict(keep_stereo=False))
morphs = [pert]
# create a system from the protein and the morphs and set up some default
# settings regarding system preparation
system = Ensemble("GROMACS", protein=protein_proto, morphs=morphs,
box_length_complex=7, ligand_ff="gaff2",
workdir=protein_proto.workdir.path)
# only keep the reference ligand and keep all crystallographic waters
system.protein.filter(ligands=None, waters="all")
# protonate the protein using PDB2PQR and add missing atoms / residues if
# needed
system.protein.prepare()
# prepare the complex and solvated leg starting structures and save them as
# GROMACS input
system.prepareComplexes()
if __name__ == '__main__':
ligands = ['/proto/mola.pdb',
'/proto/molb.pdb']
prepare_system(ligands)
I think the difference may due to the randomness of rdkit minimization outside of the MCS part.
It does seem to be the case that the minimisation algorithm introduces randomness - I also get different results from different runs. I guess that's another argument for designing custom reference ligands, as alignment can't always be trusted.
Another bad case where default parameters will not work correctly. test_align.zip
Here is the minimum code to reproduce, it will consume all of memory in my machine.
from ProtoCaller.Ensemble import Ligand, Perturbation
lig1 = Ligand('mol810_protonated.sdf', protonated=True, minimise=False,
parametrised_files=['mol810.prmtop', 'mol810.inpcrd'])
lig2 = Ligand('mol812_protonated.sdf', protonated=True, minimise=False,
parametrised_files=['mol812.prmtop', 'mol812.inpcrd'])
pert = Perturbation(lig1, lig2)
pert.alignToEachOther()
When using mcs_parameters=dict(break_recursively=False)
, it will work correctly.
This won't be straightforward to fix, and I am considering turning off the recursive bond breaking algorithm by default until it is viable.
Hi @msuruzhon , recently, I tried the rmsdAlign
function of BioSimSpace
, it produced some good results and may be a good fallback plan when ProtoCaller couldn't do well in some cases.
It's easy to implement it thanks to the modular design of ProtoCaller
, here is the code snippet I've used:
class BSSPerturbation(Perturbation):
def __init__(self, ligand1, ligand2, name=None):
Perturbation.__init__(self, ligand1, ligand2, name)
def alignAndCreateMorph(self, ref):
_runexternal.runExternal(f'obabel {ref.protonated_filename} -O ref.pdb')
ref_bss = _BSS.IO.readMolecules('ref.pdb').getMolecule(0)
lig1_bss = _BSS.IO.readMolecules(self.ligand1.parametrised_files).getMolecule(0)
lig2_bss = _BSS.IO.readMolecules(self.ligand2.parametrised_files).getMolecule(0)
lig1_aligned = _BSS.Align.rmsdAlign(lig1_bss, ref_bss)
lig2_aligned = _BSS.Align.rmsdAlign(lig2_bss, lig1_aligned)
merged = _BSS.Align.merge(lig1_aligned, lig2_aligned)
return merged, None
Hi, I installed the new 1.2.0 version of protocaller, It seems like there were some bug caused abnormal memory usage. The following code, which can run normally with old version, consumpt all of memory in my machine (8G), then killed by the system.
example_code.zip