fix: parameters of prepare ligand and receptor

josan82 commented 5 years ago

When using prepare_each = False, the actualization of coordinates in method _update_pdbqt_coordinates is performed assuming that the order of atoms is the same in the chimera and pdqt molecules. That was not true, because AD4LigandPreparation and AD4ReceptorPreparation were making changes in the structure of the molecule in order to make some repairs (hydrogens) and clean ups. Now, these functions are called with a parameter configuration that avoids structure modifications of the original molecules.

jaimergp commented 5 years ago

Are there any other kwargs to consider in the preparation routines? Maybe we can use this PR to add some keywords exposed to the user. Can you link to the ADT docs so I can check? Thanks!

Otherwise, this looks good to me. I have to fix the Travis builds, and once we can test that, we are good to go.

josan82 commented 5 years ago

[AD4ReceptorPreparation] (http://mgltools.scripps.edu/api/AutoDockTools/AutoDockTools.MoleculePreparation.AD4ReceptorPreparation-class.html) [AD4LigandPreparation] (http://mgltools.scripps.edu/api/AutoDockTools/AutoDockTools.Docking-pysrc.html/AutoDockTools.MoleculePreparation.AD4LigandPreparation-class.html)

I think that with these kwargs the molecules always will stay with the same structure as the pdb/mol2 files introduced by the user (no repairs and clean ups).

Let the user change these parameters could lead to unwanted effects combined with prepare_each=False. If prepare_each=True there is no problem, but if it's False then the current implementation of _update_pdbqt_coordinates expects the same atoms and order to actualize the coordinates without generating the pdbqt from scratch.

jaimergp commented 5 years ago

Are those repairs and cleanups required for the scoring function to work? I don't know if disabling permanently will incur in problems for some cases, like missing hydrogens and so on.

I think we should look into _update_coordinates better... Current implementation is fragile and assumes too many things. The ADT package should contain functions to write the PDBQT correctly by providing some kind of object, so we should better cache that object, update the coordinates inside it, and then pass it to the hypothetical ADT writer? Let me know what you think.

josan82 commented 5 years ago

If we set these parameters to don't make changes in the original molecules, of course it could lead to bad scoring if the molecules are not correctly prepared. Eventually, the clean-ups can also make variations in the scoring (not dramatically in the cases I've tested). I think that's the reason why ADT allows to parametrize these cleans-ups to permit some customization of the scoring depending on the nature of the system and the repairs could help in some cases of bad input files.

Then, my first thought, as yours, was to adapt the _update_coordinates, but I don't see an easy manner. I don't see an object in ADT that you can cache and modify the coordinates in it. But, truth to be said, I spent almost all yesterday's afternoon trying to figure out the reason of the bad scoring in my tests, so my head was not very clear.

As you say, the best way would be to adapt _update_coordinates and allow all the parametrization that ADT offers, but at least, this (temporary) modification ensures that what you put in the input pdb/mol2 files is what you get in the vina score.

josan82 commented 5 years ago

A typical workflow without modificating _update_coordinates would be:

The user prepares the pdb/mol2 files (using vina for fixing them if he wants)
GaudiMM vina objective doesn't make any structural changes to the molecules introduced by the user (parameters set as in the PR)
GaudiMM vina checks if the number of atoms of the pdbqts generated by AD4ReceptorPreparation/AD4LigandPreparation is the same as the number of atoms of the original pdb/mol2. If not, stops the calculation and alerts the user.

jaimergp commented 5 years ago

Can you rerun the tests in Travis please?

# edit your last commit, giving it a new time stamp and hash
# (you can just leave the message as it is)
git commit --amend
# push to github, overwriting your branch
git push -f

jaimergp commented 5 years ago

Tests are failing for gaudi.objectives.vina because the ligand (extracted from 5ER1) has wrong atom types that are not fixed in this PR (we are deliberately skipping those operations). A separate test case must be provided for that (properly configured protein/ligand, I'd say).

JeanDidier commented 5 years ago

Note that in the practicum I give in my MSc lessons, it is often that vina breakdown because on non convenient atom typing. This can be easily corrected in the editing the pdbqt file though

JeanDI

Prof. Dr. Jean-Didier Maréchal Associate Professor

Insilichem Departament de Química Universitat Autònoma de Barcelona Edifici C.n. 08193 Cerdanyola (Barcelona) Tel: +34.935814936 e-mail: JeanDidier.Marechal@uab.es personal webpage: http://gent.uab.cat/jdidier insilichem webpage: http://www.inslichem.com

Le mar. 18 déc. 2018 à 14:40, Jaime Rodríguez-Guerra < notifications@github.com> a écrit :

Tests are failing for gaudi.objectives.vina because the ligand (extracted from 5ER1) has wrong atom types that are not fixed in this PR (we are deliberately skipping those operations). A separate test case must be provided for that (properly configured protein/ligand, I'd say).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/insilichem/gaudi/pull/8#issuecomment-448224185, or mute the thread https://github.com/notifications/unsubscribe-auth/AP8jqLmn2_DAmWqnW12-WL51HZ0oBZhWks5u6PA8gaJpZM4ZSD8L .

jaimergp commented 5 years ago

To fix these errors, we have two strategies:

A) Provide a boolean flag repair to enable/disable automatic reparations of the structures. It would be disabled by default (current behaviour of this PR), and users could enable it at their own risk. This is, adding atoms could mask errors when passing coordinates down to the PDBQT files. This is not desirable in my opinion.

B) Change the current tests so it uses an already amended structure, ready for use with Vina. These structures should have been prepared with AutoDockTools scripts (Prepare*.py) and the resulting PDBQT files used directly in the Molecule genes. We should make sure that using PDBQT does not cause errors in other parts of the code (it shouldn't, but you never know...), so we better provide some tests for that too.

I'd go with option B, so the tasks list is:

[ ] Fix tests so they pass. Use fixed structures as input.
[ ] Make sure PDBQT cause no harm in other parts of the code.
[ ] Document this decision in docs/ (class docstring is enough).
[ ] Catch some common errors that can be caused by lack of parameters, like ValueError: Could not find atomic number for Lp Lp and provide an informative error with a link to the corresponding part of the docs.

jaimergp commented 5 years ago

Any updates or ETA?

jaimergp commented 4 years ago

Is this PR superseded by any of the recent ones?

insilichem / gaudi

fix: parameters of prepare ligand and receptor #8