vtarasv / 3d-prot-dta

3DProtDTA: a deep learning model for drug-target affinity prediction based on residue-level protein graphs
https://doi.org/10.1039/D3RA00281K
16 stars 2 forks source link

Davis, KIBA, DTC, Metz, and ToxCast #4

Closed speakstone closed 9 months ago

speakstone commented 1 year ago

Regarding the Davis, KIBA, DTC, Metz, and ToxCast datasets, could you please guide me to where I can find the prediction results using AlphaFold? I have delved deeply into your work and understand its immense value for drug discovery. Compared to the current cutting-edge algorithms, yours stands out as the most significant. As a newcomer to the research field, I sincerely hope for your guidance and response to my queries.

vtarasv commented 1 year ago

The KIBA dataset is annotated directly by the UniProt IDs and the corresponding structures may be found in the AlphaFold2 database.

The Davis dataset proteins are annotated by the gene names and can be mapped to the UniProt identifiers by the UniProt mapping tool . The Davis data, however, includes proteins with mutations. Thus, the structures have to be predicted manually. The useful tool for the AlphaFold custom sequence prediction is ColabFold.

You can also find prepared 3D structures in this repository's data folder for both abovementioned datasets.

I didn't work with the DTC, Metz, and ToxCast datasets. Therefore, I cannot give direct guidance regarding them.

speakstone commented 1 year ago

The KIBA dataset is annotated directly by the UniProt IDs and the corresponding structures may be found in the AlphaFold2 database.

The Davis dataset proteins are annotated by the gene names and can be mapped to the UniProt identifiers by the UniProt mapping tool . The Davis data, however, includes proteins with mutations. Thus, the structures have to be predicted manually. The useful tool for the AlphaFold custom sequence prediction is ColabFold.

You can also find prepared 3D structures in this repository's data folder for both abovementioned datasets.

I didn't work with the DTC, Metz, and ToxCast datasets. Therefore, I cannot give direct guidance regarding them.

Thank you very much for your response. As you mentioned, I have found AlphaFold results for all entries in the KIBA dataset, except for P78527. Unfortunately, there seems to be a significant amount of missing data from the DAVIS dataset. I truly admire and appreciate the work you've done in this area; your research has added immense value to the field. I noticed that you have reconstructed all datasets and obtained the AlphaFold2 predictions before processing them with Uniprot. Have you considered open-sourcing this data? It would be of tremendous help to researchers with limited resources.

vtarasv commented 1 year ago

Sure, as I mentioned above, the data can be found here : prot_3d_for_Davis.tar.gz, prot_3d_for_KIBA.tar.gz

speakstone commented 1 year ago

Sure, as I mentioned above, the data can be found here : prot_3d_for_Davis.tar.gz, prot_3d_for_KIBA.tar.gz

I apologize for any confusion in my previous message. From what I've gathered, the provided PDB data has been processed through Uniprot. Would it be possible for you to share the original PDB data?