mims-harvard / TDC

Therapeutics Commons: Artificial Intelligence Foundation for Therapeutic Science
https://tdcommons.ai
MIT License
957 stars 169 forks source link

Rat Liver Microsomal Stability - new dataset #206

Closed iwwwish closed 2 months ago

iwwwish commented 11 months ago

Describe the problem

Hepatic metabolic stability is a key pharmacokinetic parameter in drug discovery. Metabolic stability is usually assessed in microsomal fractions and only the best compounds progress in the drug discovery process. A high-throughput single time point substrate depletion assay in rat liver microsomes (RLM) is employed at the National Center for Advancing Translational Sciences. In this process, metabolic stability data for 2528 compounds were made public via a PubChem deposition [1]. Furthermore, RLM data for a total of 220 approved drugs that are routinely screened in different drug repurposing projects were also disseminated [2] and this can serve as an independent validation set. Currently, TDC hosts only the CYP450 isoform datasets under the category of metabolism in ADME tasks. Therefore, this dataset is expected to provide the users of TDC with an additional metabolism related dataset that captures metabolism mediated via multiple CYP450 isoforms.

References: [1] and [2]

Describe the solution you'd like

from tdc.single_pred import ADME
data = ADME(name = RLM_NCATS')
split = data.get_split()

df = data.get_approved_set() # independent validation set

Additional context A GCNN model built on a much larger RLM dataset (only a subset was made public) is available here.

kexinhuang12345 commented 9 months ago

Looks great! Is there a pull request that we can review to integrate this? thanks!

amva13 commented 2 months ago

@iwwwish you can look at https://github.com/mims-harvard/TDC/pull/252 for how to add datasets to existing tasks. please let us know if you have any questions.