kexinhuang12345 / DeepPurpose

A Deep Learning Toolkit for DTI, Drug Property, PPI, DDI, Protein Function Prediction (Bioinformatics)
https://doi.org/10.1093/bioinformatics/btaa1005
BSD 3-Clause "New" or "Revised" License
975 stars 272 forks source link

How to use DeepPurpose for Virtual screening? #24

Closed hima111997 closed 4 years ago

hima111997 commented 4 years ago

Greetings sir,

I want to use DeepPurpose for Virtual screening using drugs downloaded from databases with a certain protein.

Can you give me information on how to do this? such as the preparation of drugs and the protein?

Thanks

kexinhuang12345 commented 4 years ago

Hi, do you have high throughput screening data for this protein? if so, you can use the drug property prediciton mode; if not, you can use the DTI mode.

hima111997 commented 4 years ago

Do you mean using code like the one found in this notebook? https://github.com/kexinhuang12345/DeepPurpose/blob/master/Tutorial_1_DTI_Prediction.ipynb

I noticed that in the notebook, Virtual screening used many drugs and many proteins. but the videos I watched on youtube that used autodock vina used one protein and many drugs. Which way is the right way for virtual screening?

thanks

kexinhuang12345 commented 4 years ago

hi, so DTI is to train upon a large set of proteins and drugs to generalize over unseen target and drugs. For virtual screening, you can also specify the one protein and many drugs. Just need to replicate each protein to a list of proteins.

hima111997 commented 4 years ago

so it will be like this:

drug_1 protein_1 drug_2 protein_1 drug_3 protein_1 ...

right?

kexinhuang12345 commented 4 years ago

Yes, you are right.

hima111997 commented 4 years ago

so can I use a pre-trained model or should I train a model with the same input?

for example: a model trained on many proteins and many drugs --> use it to predict the binding between many drugs and one protein OR a model trained on many drugs and one protein --> use it to predict the binding between many drugs and one protein

Also, are there any pre-trained models for the second choice (if it is the correct one) ? or should I modify a dataset to be the same as what I want to do (many drugs and one protein)?

kexinhuang12345 commented 4 years ago

Hi, the use case of DTI is when there is no screening data available for the testing drugs and targets so that we can rely on the model that is trained on a large amount of DTI pairs to learn chemical semantics and generalize over unseen new drugs/targets. But for use case that is specific for one target protein of interest and has screening data (many drugs and their affinity scores with the protein). then it is a good idea to train a drug property prediction model (DeepPurpose.CompoundPred); and then use it to predict new drugs. so there is no protein information involved. That said, you can still try DTI model for the screening data given you have the protein sequence; although it is extra complexity since you need to use a separate protein encoder. so i would suggest use drug property prediction directly.

hima111997 commented 4 years ago

In my case, I have one protein and a large number of drugs without their affinity scores. From your last comment I understood that I should use the DTI with pretrained models, right?  And the input data will be as follows:Drug1  protein1Drug2  protein1 Drug3  protein1etc... Is that right?  Sent from Yahoo Mail on Android

On الاثنين, سبتمبر ٢٨, ٢٠٢٠ at ١٢:٢٨ ص, Kexin Huangnotifications@github.com wrote:

Hi, the use case of DTI is when there is no screening data available for the testing drugs and targets so that we can rely on the model that is trained on a large amount of DTI pairs to learn chemical semantics and generalize over unseen new drugs/targets. But for use case that is specific for one target protein of interest and has screening data (many drugs and their affinity scores with the protein). then it is a good idea to train a drug property prediction model (DeepPurpose.CompoundPred); and then use it to predict new drugs. so there is no protein information involved. That said, you can still try DTI model for the screening data given you have the protein sequence; although it is extra complexity since you need to use a separate protein encoder. so i would suggest use drug property prediction directly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

kexinhuang12345 commented 4 years ago

Yes, if you just want to get the prediction score.

hima111997 commented 4 years ago

Thank you so much sir.  Another question, please. Is there a way to get the score for the binding between drugs and a certain domain of a protein? For example, adding the sequence of interest only (not the whole protein sequence) in the input file?

Sent from Yahoo Mail on Android

On الاثنين, سبتمبر ٢٨, ٢٠٢٠ at ٢:١٩ ص, Kexin Huangnotifications@github.com wrote:

Yes, if you just want to get the prediction score.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

kexinhuang12345 commented 4 years ago

Hey, It is not currently supported. But I think it is an interesting point and I will look into it.

hima111997 commented 4 years ago

Thank you so much for your support.

kexinhuang12345 commented 4 years ago

great, let me know if you have more question and closing for now.