Closed sailseem closed 3 years ago
Hi, it needs to be fed into the data_process
function first:
data = data_process(X_drug, X_target, y,
drug_encoding, target_encoding,
split_method='no_split')
You can also use oneliner mode if you are only interested in getting prediction result: https://github.com/kexinhuang12345/DeepPurpose/blob/master/DEMO/case-study-II-Virtual-Screening-for-BindingDB-IC50.ipynb
It's getting confused, because we dont have a y value, how to process this with input y. this is what we want to know
Oh sorry, my bad. Yeah, in this case, you should use either the oneliner mode or to load a pretrained model and then call
_ = DTI.virtual_screening(drug, target, model, drug_name, target_name)
where drug/target can be string or a list of strings (SMILES, Target sequence)
all right, but let's say i got 3 drugs and one protein drug3, drug_name, target, target_name = ['Cc1cnc2c(NCCN)nc3ccc(C)cc3n12','Oc1cccc(c1)-c1nc(N2CCOCC2)c2oc3ncccc3c2n1','CC1(C)CNc2cc(NC(=O)c3cccnc3NCc3ccncc3)ccc12','OC[C@H]1OC@@H[C@H]2O)C@HC@@H[C@H]1O'], ['no1','no2','no3'], ['MLGRNTWKTSAFSFLVEQMWAPLWSRSMRPGRWCSQRSCAWQTSNNTLHPLWTVPVSVPGGTRQSPINIQWRDSVYDPQLKPLRVSYEAASCLYIWNTGYLFQVEFDDATEASGISGGPLENHYRLKQFHFHWGAVNEGGSEHTVDGHAYPAELHLVHWNSVKYQNYKEAVVGENGLAVIGVFLKLGAHHQTLQRLVDILPEIKHKDARAAMRPFDPSTLLPTCWDYWTYAGSLTTPPLTESVTWIIQKEPVEVAPSQLSAFRTLLFSALGEEEKMMVNNYRPLQPLMNRKVWASFQATNEGTRS'], ['protein']
this function only return one pair binding score _ = DTI.virtual_screening(drug3, target, model, drug_name, target_name)
Virtual Screening Result +------+-----------+-------------+---------------+ | Rank | Drug Name | Target Name | Binding Score | +------+-----------+-------------+---------------+ | 1 | no1 | protein | 7.51 | +------+-----------+-------------+---------------+
and funny thing is that no matter what i input as drug smiles, always return to 7.51, its all related to protein sequence
for virtual screening mode, you should put target as three same protein sequence as a list, instead of just one. So
['MLGRNTWKTSAFSFLVEQMWAPLWSRSMRPGRWCSQRSCAWQTSNNTLHPLWTVPVSVPGGTRQSPINIQWRDSVYDPQLKPLRVSYEAASCLYIWNTGYLFQVEFDDATEASGISGGPLENHYRLKQFHFHWGAVNEGGSEHTVDGHAYPAELHLVHWNSVKYQNYKEAVVGENGLAVIGVFLKLGAHHQTLQRLVDILPEIKHKDARAAMRPFDPSTLLPTCWDYWTYAGSLTTPPLTESVTWIIQKEPVEVAPSQLSAFRTLLFSALGEEEKMMVNNYRPLQPLMNRKVWASFQATNEGTRS'] * 3
alternatively, use repurposing mode
Thanks, one more question. Cause you choose the kd value from bindingDB as input? Normally, the lower Kd means the high affinity. What's the meaning of the binding score? Same as Kd? or they are different? What's the normal range of binding score? How big or small should consider as a good binding affinity? Thanks
Hi, yes, it is all depended on the training data. If the training data is Kd, then the inference value is all in Kd. I think in the one-liner mode, it is Kd, so lower the better. You can also transform it to pKd by setting convert_y. Note there are also a couple of models that are in IC50.
sorry to re-confirm this, by using the pre-trained model like virtual screen, the binding score was ranking from high to low, feels like, the top score equals to best affinity, why you chose that way to interpret data?
Dear Dr.huang, Thank you for providing such a great package, well, i am quite new to all of this. Assumed that i got the well trained model for drug and target binding based on bindingDB database. And, i can upload like 100 drugs smiles by using dataset.read_file_target_sequence (drugs.txt), at the same time, upload 100 targets using same dataset.read_file_target_sequence (targets.txt). But, how to apply this to get a possible binding score of each pair?
like "model.predict(drug,targets)" but this returns TypeError: predict() takes 2 positional arguments but 3 were given
I am sorry for your time to look at such a silly question, but thanks a lot, anyway Bests, fan