hkmztrk / DeepDTA

215 stars 107 forks source link

Where can I download Kinese KIBA and Davis datasets? #4

Closed LukasChoi closed 5 years ago

LukasChoi commented 5 years ago

I'm implementing DeepDTA with TensorFlow. By the way, I couldn't download KIBA and Davis Datasets. Where can I download Kinese KIBA and Davis datasets?

Thanks in advance.

hkmztrk commented 5 years ago

Hello @LukasChoi, thank you for your interest in our work. You can find the datasets under data folder. For each dataset: ligands.txt -> it contains the sequences of drugs (SMILES) proteins.txt -> it contains the sequences of proteins in the data affinity.txt -> contains the binding affinity matrix in txt form (drugs x proteins in the same order given in the above text files Y -> contains the binding affinity matrix in pickle form (drugs x proteins in the same order given in the above text files)

LukasChoi commented 5 years ago

There are a few questions:

In the /data/davis folder: 1) there are two similarities, drug-drug and target-target, what's the criteria of similarity? 2) what's the meaning of numbers in the target-target_similarities_WS.txt? I don't think the number means probability. 3) what's the standard of affinity? I don't understand the meaning of numbers in drug-atrget_interaction_affinities_Kd__Davis_et_al.2011v1.txt 4) there are six test_fold_setting files under "folds" directory, what's the role of these ones? Can you explain the meaning of numbers in the files? For example, [34121, 51548, 12611,...]

In the /data/kiba folder: 5) there two versions of affinity files, kiba_binding_affinity and kiba_binding_affinity_v2, what's the difference? 6) and what's the meaning of 'nan' in the affinity files? 7) there are ligands and proteins files with kiba prefix and without one, kiba_proteins and proteins, what's the difference? 8) what's the role of sim files (kiba_drug_sim and kiba_target_sim)?

In understanding the datasets, your explanation will be very helpful to me.

Thanks in advance.

hkmztrk commented 5 years ago

Hi @LukasChoi, I made a readme file here and removed some duplicate data. https://github.com/hkmztrk/DeepDTA/blob/master/data/README.md

Could you please refer to this read me and then check out the updated datasets? If there is anything unclear, let me know.