maabuu / posebusters

Plausibility checks for generated molecule poses.
https://posebusters.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
211 stars 13 forks source link

Vina results are not able to reproduce. #31

Closed rytakahas closed 5 months ago

rytakahas commented 6 months ago

Hi, many thanks for making avaible for all datasets and codes.

Since vina is almost widely use docking software, first, I wanted to reproduces all results by myself. Thanks to the benchmark datasets. The ligands have been already sanitized, so that I just need to protonate only proteins. But. then, I have encountered two questions

  1. Following the vina (1.2.5 version) protocol which described in S1. By using ADFR prepare_receptor, it does not accept when the cofactors are in the receptor. How did you keep cofactors in the receptors? So far what I only know is that I am able to convert pdbqt with cofactors in the receptor by openbabel, e.g.,

    obable -ipdb complex.pdb -opdbqt -xrh -O complex.pdbqt

    as in the paper, when vina's results were docked with cofactors if the complexes have, right?

  2. Using the above above conveted pdbqt by obabel, I executed vina with the S1 descriptions:

    vina --receptor receptor(+cofactors if there are).pdbqt --ligand_start_conf.pdbqt --config ligand_conf.txt --seed 123 --num_modes 40 --exhaustiveness 32 

    ligand_conf.txt: A bounding box with side-length 25 ˚A was created around the centroid of the crystal ligand.

Then, when I analized the vina's results of RMSD within 2 Å with PoseBusters(config="redock") API, I got much worse results. The paper stated more than 50 % (close to 60%), however, I just got about 25% of success rate.

This is not directry related to the posebusters code, it is rather about the paper results, but. I am bit puzzling what I did wrong? If you give me some insights, I am really appreciated.

Many thanks,

maabuu commented 6 months ago

Thank you for raising this issue.

For the ligand preparation, try Meeko's MoleculePreparation class instead of ADFR's prepare_receptor function to prepare the ligand and PDBQT file.

For the protein, we do:

  1. reduce complex.pdb > complex_reduced.pdb
  2. obabel complex_reduced.pdb -xr -O complex_reduced_prepped.pdb -p 7.4.

Also make sure you compare against the final set of 308 structures. In the 428 structures there are structures with crystal contacts (e.g. 5S8I_2LY) which Vina and Gold do not return good results for. The discussion in https://github.com/maabuu/posebusters/issues/26 explains more about the crystal contacts.

rytakahas commented 6 months ago

Many thanks for the quick reply. I just followed your protocols, and I just repeated the CCD list, however, my results are still underperformed.

I am sorry for repeating my question, but

Since I am not familiar with reduce and not find any tutorial either. I just looked at reduce -H. Then, I have one question. As your instruction 1. which is not Flags, what are the defualts? e.g.

Suggested usage:
reduce -FLIP myfile.pdb > myfileFH.pdb (do NQH-flips)
reduce -NOFLIP myfile.pdb > myfileH.pdb (do NOT do NQH-flips)

FLIP option is better in general? However, without Flag, which way does it goes?

The second questions is that when I looked at posebusters_paper_results.csv, there are results of vina, and when I counted method:vina and post-processing:none are

vina =  rmsd_within_threshold
True     273
False    240

vina was tested all. This is the reason, I missed S3 table and #26, and I did test all, so again, I would like to make sure that e.g.

  1. reduce (-NOFLIP) 5SAK_ZRY_protein.pdb > 5SAK_ZRY_protein.pdb
  2. obabel -ipdb 5SAK_ZRY_protein.pdb -xr -opdbqt -O 5SAK_ZRY_protein.pdbqt -p 7.4 were performed for protein protonations, correct?

Thanks for ligands sanitization, however, as you provided to us, 5SAK_ZRY_ligand_start_conf.sdf has been sanitzed so that

obabel -isdf  5SAK_ZRY_ligand_start_conf.sdf -O 5SAK_ZRY_ligand_start_conf.pdbqt  -p 7.4

for the ligand, right?

For my research, I would like to make sure I am able to get about 50% accuracy for the vina.

Many thanks,

maabuu commented 6 months ago

The table contains results for all 428 structures but only the final set of 308 structures is reported on in the paper. On this smaller set Vina has 60% of poses within 2Å RMSD and 58% of poses in addition pass all PoseBuster tests. The difference is that the smaller set does not contain structures with crystal contacts, an update that was prompted during peer review (https://github.com/maabuu/posebusters/issues/26).

For the protein, we used the reduce and obabel commands exactly as stated above.

For docking with meeko follow these steps: https://www.blopig.com/blog/2022/08/meeko-docking-straight-from-smiles-string/. The ligand starting conformations have not been processed by obabel. They are RDKit conformations generated from the InChI strings that are reported for the ligands in the Chemical Component Dictionary.

rytakahas commented 5 months ago

Many thanks for your comments. Even thought this is not directly related to posebusters codes, it was just question about vina's results in the paper. Many thanks for your comments. As the above, dockings were proceeed by python version, this was solution for me. When I check vina's 1.2.5 which I was used the same version of vina executable. so I was bit sceptical to chnage exeutable vina to python API, 'how I can gain accuracy more than 20% from my current results!?'.

But, yes, after execute python version of vina, I now am able to recover your paper's results more than 60% of poses within 2Å RMSD in 308 datasets. At the execusion process, executable vina has extending option for the simulation box. But, not in the python version. Apart of this, I thought the python version is just a wapper. Anyhow, these issues (questions) will address to the vina's community.

Many thanks for your help and many thanks for your great work!