mims-harvard / TDC

Therapeutics Commons: Artificial Intelligence Foundation for Therapeutic Science
https://tdcommons.ai
MIT License
963 stars 170 forks source link

New DrugComb data #191

Open TangYiChing opened 1 year ago

TangYiChing commented 1 year ago

Describe the problem The DrugComb database has released new drug combination and monotherapy screening datasets, which includes cancer, malaria, and COVID-19.
Reference: [https://doi.org/10.1093/nar/gkab438]

Describe the solution you'd like Replace current TDC/data/drugcomb.pkl with the new file at (https://drugcomb.org/download/), and add new columns ['Study name', 'Disease'] to distinguish cancer, malaria, or COVID-19.

Additional context N/A.

kexinhuang12345 commented 1 year ago

Thank you! It would be a great idea! Would you like to make a PR for it?

TangYiChing commented 1 year ago

Thank you! It would be a great idea! Would you like to make a PR for it?

DrubComb provides API for quick access to both drug and cell line information. They already have SMILE strings and cell line ids. In terms of adding a new drug-drug-cell line triplet to the current TDC dataset, what needs to be added now is the gene expression values from the CallMiner database. What would you like me to do to facilitate the process?

kexinhuang12345 commented 1 year ago

Thank you! Is the gene expression values available only in CallMiner? I saw in the paper they can retrieve them through public databases such as DepMap, Cell Model Passports, etc. https://academic.oup.com/view-large/figure/267020980/gkab438fig1.jpg

TangYiChing commented 1 year ago

Thank you! Is the gene expression values available only in CallMiner? I saw in the paper they can retrieve them through public databases such as DepMap, Cell Model Passports, etc. https://academic.oup.com/view-large/figure/267020980/gkab438fig1.jpg

Yes, these are commonly used sources nowadays, and they are all RNA-seq data now (i.e., expression values are TPM). We might need a new workflow for data processing.