kexinhuang12345 / DeepPurpose

A Deep Learning Toolkit for DTI, Drug Property, PPI, DDI, Protein Function Prediction (Bioinformatics)
https://doi.org/10.1093/bioinformatics/btaa1005
BSD 3-Clause "New" or "Revised" License
970 stars 272 forks source link

why there are different results when i use the same inputs in repurpose and virtual_screening functions? #40

Closed hima111997 closed 3 years ago

hima111997 commented 3 years ago

I used repurpose and virtual_screening functions from oneliner.py. the drugs and the protein were the same in the two cases and I used the pretrained models, however, the results were different.

why did this happen? the inputs (drugs smiles and protein sequence) and the models are the same, so should not the results be the same too? in the case of virtual_screening I used one sequence but wrote it many times

the input file for repurpose was as follows: smile_files: drug_name1 drug_smiles1 drug_name2 drug_smiles2 drug_name3 drug_smiles3 ...

protein: I used this function load_SARS_CoV2_Helicase()

the input file for virtual_screening was as follows:

input_file: drug_smile1 protein_sequence drug_smile2 protein_sequence drug_smile3 protein_sequence ...

kexinhuang12345 commented 3 years ago

they should be the same, could you send the scripts for me to reproduce?

hima111997 commented 3 years ago

the repurposing script:

from DeepPurpose import oneliner
from DeepPurpose.dataset import read_file_repurposing_library, load_SARS_CoV2_Helicase
oneliner.repurpose(*load_SARS_CoV2_Helicase(),
*read_file_repurposing_library('/content/drive/My Drive/DATABASES/name_drugs_repur_drugbank.smiles'),
save_dir='/content/drive/My Drive/DATABASES/results/repurpose/')

the virtual screening script:

from DeepPurpose import oneliner
from DeepPurpose.dataset import read_file_repurposing_library, read_file_virtual_screening_drug_target_pairs

_ , drug_names = read_file_repurposing_library('/content/drive/My Drive/DATABASES/name_drugs_repur_drugbank.smiles')
drugs, target = read_file_virtual_screening_drug_target_pairs('/content/drive/My Drive/DATABASES/name_target_VS_drugbank.smiles')
oneliner.virtual_screening(target=target, X_repurpose=drugs, drug_names= drug_names,
 save_dir='/content/drive/My Drive/DATABASES/results/VS/')

the data files:

name_target_VS_drugbank.txt name_drugs_repur_drugbank.txt

hima111997 commented 3 years ago

@kexinhuang12345 is there a problem with DeepPurpose?

kexinhuang12345 commented 3 years ago

hi, sorry, I am pretty busy these two weeks to catch a big deadline. I don't think there should be a big issue, prob some different aggregation schemes as the input to the models are the same. I couldn't find an obvious issue for now, but I will try to fix it in the next two weeks.

hima111997 commented 3 years ago

Okay, take your time and I hope you finish your task before your deadline 💪💪

Sent from Yahoo Mail on Android

On الخميس, نوفمبر ٥, ٢٠٢٠ at ١١:٢١ م, Kexin Huangnotifications@github.com wrote:

hi, sorry, I am pretty busy these two weeks to catch a big deadline. I don't think there should be a big issue, prob some different aggregation schemes as the input to the models are the same. I couldn't find an obvious issue for now, but I will try to fix it in the next two weeks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

hima111997 commented 3 years ago

I think I know where the problem is. in the image, after you made the individual predictions from the five models in oneliner.py, you made another prediction using the last model ('daylight' , 'AAC'), then used these results in the following lines and therefore the final results are the results from the 5th model only. i think the correction will be to remove these four lines. convert_y_units function should be removed because it is already converted after each prediction (lines 166 and 167 in DTI.py).

image

when I compared the final results and the results from this model only they were identical. the daylight-AAC only results:

image

the final results:

image

kexinhuang12345 commented 3 years ago

hey, that makes lots of senses! thanks so much for going through the code and find this bug! do you want to do a PR for that by removing these four lines of codes? so that you would be on the contribution board. i can also do it if you would like. Let me know!

hima111997 commented 3 years ago

First I will try to run the code on Colab after I remove these four lines, and if it works I will try to do a pull request.  Thank you so much.

Sent from Yahoo Mail on Android

On الجمعة, نوفمبر ٦, ٢٠٢٠ at ١:٤٤ ص, Kexin Huangnotifications@github.com wrote:

hey, that makes lots of senses! thanks so much for going through the code and find this bug! do you want to do a PR for that by removing these four lines of codes? so that you would be on the contribution board. i can also do it if you would like. Let me know!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

hima111997 commented 3 years ago

@kexinhuang12345 it worked and gave the same results as repurposing function.

I made the PR request

image

image

kexinhuang12345 commented 3 years ago

thanks! just merged it!