wolberlab / pyrod

PyRod - Tracing water molecules in molecular dynamics simulations
GNU General Public License v2.0
46 stars 7 forks source link

How to determine the features obtained by pyrod could be used for virtual screening #3

Open kingljy0818 opened 4 years ago

kingljy0818 commented 4 years ago

Dear David,

I have obtained the 180 features by pyrod. Would ask you how to determine the features obtained by pyrod could be used for virtual screening. I also attacheed the .pml and the top file in Pyrod_Test.zip Pyrod_Test.zip . Many thanks.

Kindest regards

Jiyuan

schallerdavid commented 4 years ago

Dear Jiyuan,

I am happy to hear that PyRod generated features for your protein of interest. Now begins the hard part, identifying the most relevant features.

To do this, I usually load the "super_pharmacophore" into the binding site in LigandScout and than add the respective dmifs, which report the score for each feature type. Several features are probably located outside the binding site and can be safely deleted. For all other features, I go through the dmifs and keep the ones having a high score according to the dmif or are located in an important position. This can be quite time-consuming. However, I did not really come up yet with an automated solution to approach this. Also, this involves some medchem knowledge/intuition which you can hardly teach a machine/algorithm. The final number of features you should end up with depends on the availability of ligand data:

1. Ligand data not available

You need to reduce the pharmacophore to about 3-7 features and use this pharmacophore right away for screening in LigandScout.

2. Ligand data available

If you have some known actives and inactives/decoys, you can generate a combinatorial pharmacophore library and evaluate the performance of each pharmacophore with the provided evaluation script /helper/library_roc_analysis_LS.py:

First, reduce the number of features of the super_pharmacophore to the most interesting ones. I usually end up with about 15 features here. When you delete hydrogen bonds, be careful what you delete. PyRod generates 5 different hydrogen bond types, i.e. single donor, single acceptor, double donor, double acceptor, and mixed donor/acceptor. When you delete hydrogen bonds make sure to delete both features of a double donor, double acceptor, and mixed donor/acceptor. This is important since PyRod treats those as single entities which helps to reduce the combinatorial space in the next step.

Next, mark all features optional that should be handled in a combinatorial manner. Features that are not marked optional, e.g. some important key interaction, will be present in each pharmacophore of the combinatorial pharmacophore library.

Use the /configs/library.cfg module to generate a combinatorial library of pharmacophores. You can change the parameters in the config file to reduce the possible combinatorial pharmacophore space a bit, e.g. each generated pharmacophore should have at least 1 hydrophobic feature but not more than 2. I believe the defaults in the cfg are in general quite useful but might need some adjustment for certain projects. When you run this cfg with PyRod you will be informed, how many pharmacophores would be generated and if you want to proceed. This is important since with the wrong setting you might end up with 5 Mio different pharmacophores. This will be rather hard to evaluate later. I usually try to end up with not more than a few thousand pharmacophores.

Once you generated the pharmacophores you can use the /helper/library_roc_analysis_LS.py script to evaluate all generated pharmacophores with provided libraries of actives and inactives/decoys. Based on enrichment factors and auc you can select the best pharmacophores for screening.

I hope this helps. I should definitely work on a manual :).

All the best,

David

kingljy0818 commented 4 years ago

Dear David,

Thank you so much for these detailed tutorial. I have loaded the super_pharmacohphore.pml by using Ligandscout v4.4.3. However, I could not insert the respective dmifs, would you tell me the details about how to load the dmifs into ligandscout, and where could be seen the score for each feature type. Many thanks.

Best regards

Jiyuan

schallerdavid commented 4 years ago

Dear Jiyuan,

the dmifs can be found in the dmifs folder within the pyrod results directory. LigandScout can read .kont-files. Just press File>Load>Load Grid and select the respective dmif. Check the PyRod wiki to get details about each dMIF. Each feature type is accompanied by a respective dmif which reports the feature score, e.g. the hi dmif for example can be used to select hydrophobic features.

All the best, David

kingljy0818 commented 4 years ago

Dear David,

Actually, I have tried to load the .kont file in Ligandscout before I sent the github message to you. But the ligandscout prompted to select the GRID probe when I loaded the .kont file, would ask you how to determine the type of probe, Any, Neutral (H), and phenol or carboxy (OH)? I have tried them, and only the choice of Neatral (H) could be loaded the respective dmif files in the ligandscout, and I still could not see the score for each feature type after I loaded the .kont file.

Kindest regards Jiyuan

schallerdavid commented 4 years ago

Dear Jiyuan,

First of all, make sure that you use a protein structure that is aligned with your trajectory, otherwise, the grid will end up somewhere else than in the binding site. I am also using LigandScout 4.4.3 so we should see similar behavior. When I load e.g. hi.kont, I choose "Any GRID Probe". The GRID should then show up on the left side of the structure-based view. I usually color the hi GRID in yellow, since it reflects the hydrophobicity. When you click on the GRID item on the left site a slider shows up that can be used to display the contour with the respective cutoff. Be aware that kcal/mol shows up as a unit, however the reported scores from PyRod are different (see wiki).

Best regards, David

kingljy0818 commented 4 years ago

Dear David,

I could load the grid from .kont file, but the default interaction energy is 0.0 kcal/mol in the left side of the structure-based view, when I click on the GRID item on the left site, a slider was shown in the right corner, would ask you how to set the most appropriate energy cutoff to display the contour with the features, and I could not see the score for each feature type even though I have seen the pyrod wiki, would need your further help. Many thanks.

Best regards

Jiyuan

schallerdavid commented 4 years ago

Dear Jiyuan,

I attached two pictures of hi dMIFs with corresponding features. I hided all features but the hydrophobic ones to make the picture more clear. In the first picture I used the slider to increase the cutoff to ~50 (kcal/mol doesnt mean anything here, its just always shown by LigandScout no matter what grid you load). As you can see all features are within the contour and thus have a score equal or higher than 50. low_score

In the second picture I increased the score to about 200. Only few features lay within this area. So one could for example delete all other features outside this contour, unless you know for another reason those areas might be important. high_score

Unfortunately, I cannot provide you with any cutoff you should use for your project, since this heavily depends on your protein of interest. Some might have a very hydrophobic pocket with HI scores > 300, others show rather modest hydrophobicity with max. HI score of 100. The same is true for all other feature types. That's actually the main reason, why PyRod for now reports so many features without prefiltering. I hope at some point that I can come up with a solution to generate reasonable focused pharmacophores.

All the best, David