kexinhuang12345 / DeepPurpose

A Deep Learning Toolkit for DTI, Drug Property, PPI, DDI, Protein Function Prediction (Bioinformatics)
https://doi.org/10.1093/bioinformatics/btaa1005
BSD 3-Clause "New" or "Revised" License
939 stars 269 forks source link

Is it possible to enter multiple SMILES and one single FASTA and perform a drug-target interaction prediction? #89

Closed tachyon3903 closed 3 years ago

tachyon3903 commented 3 years ago

I am currently creating new SMILES vectors using ChemGAN. According to the tutorial about DTI, I need to give a affinity score. What is that? Can I use my own database without KIBA or DAVIS?

kexinhuang12345 commented 3 years ago

Hi, do you want to make a prediction or want to train a model? Yeah, affinity score is basically the fitness of drug-target. You can use any dataset you want.

tachyon3903 commented 3 years ago

I want to make a prediction using a trained model. I simply don't have the affinity score. Is it necessary? Thanks a lot for your time.

tachyon3903 commented 3 years ago

FYI: Currently, all the data I have is a bunch of SMILES molecules and a single FASTA.

kexinhuang12345 commented 3 years ago

Yeah, affinity score is like the label. So to train a new model, you have to have a label for this task. In your use case, I think you can use a pretrained model. Here is an example: https://github.com/kexinhuang12345/DeepPurpose/blob/master/DEMO/case-study-I-Drug-Repurposing-for-3CLPro.ipynb

Note that it is recommended to train a new model for a specific target of interest if you have some affinity scores data, because it is likely that the pretraining dataset may not contain similar target as yours, leading to bad generalizability.

tachyon3903 commented 3 years ago

Sorry for kept bothering you. How to input my own data in https://github.com/kexinhuang12345/DeepPurpose/blob/master/DEMO/case-study-I-Drug-Repurposing-for-3CLPro.ipynb

Quite frankly, I can't find a way to input my own data. Once again, thank you a lot for your help.

tachyon3903 commented 3 years ago

BTW, I think there is a bug in the "dataset.read_file_repurposing_library" command. When I attempt to input my own SMILES.txt, it gives "UnboundLocalError: local variable 'file' referenced before assignment." Thanks.

kexinhuang12345 commented 3 years ago

Hi, yeah, here is a tutorial on the data processing: https://github.com/kexinhuang12345/DeepPurpose/blob/master/DEMO/load_data_tutorial.ipynb

This should help you know the format of each function input and output.

tachyon3903 commented 3 years ago

Thanks a lot for your help! I really appreciate them. The codes are working now. Is the binding score (in DTI) a probability? Also, what is the binding score in https://github.com/kexinhuang12345/DeepPurpose/blob/master/DEMO/case-study-I-Drug-Repurposing-for-3CLPro.ipynb code? Since I do not have enough data to test, I was wondering if you would know that. Thanks! Your advice are really helpful.

kexinhuang12345 commented 3 years ago

In binary case, yes, it is a probability. In regression, the score is the true binding score in various kinds of units, depending on the training dataset. In the notebook, it is nM. But again, the result is using pretrained model, the generalizability is not guaranteed.

tachyon3903 commented 3 years ago

Thank you! Without your help, I would not be able to run the code!

kexinhuang12345 commented 3 years ago

No problem! Closing for now

tachyon3903 commented 3 years ago

Hello. When I am running the code, I have to re-train the model every time. Do you have a pre-trained model so that I can just load it? Also, what exactly is the virtual screening function? Sorry to bother you again.

kexinhuang12345 commented 3 years ago

Hi, you can load any model by saving and loading it. Checkout cell 15,16 in https://github.com/kexinhuang12345/DeepPurpose/blob/master/Tutorial_1_DTI_Prediction.ipynb. Virtual screening is to predict based on many drug-target pairs.

tachyon3903 commented 3 years ago

Thanks. Which model do you recommend? I am trying to do a prediction between COVID-19 RNA polymerase and drug about COVID-19 (e.g. Remdesivir).

tachyon3903 commented 3 years ago

I was trying to run the PPI model. What are the requirements? Thanks a lot.