greenelab / pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

CCLE drug response classification, stratified by cancer type #56

Closed jjc2718 closed 1 year ago

jjc2718 commented 1 year ago

This is the first PR of (probably) several setting up drug response prediction on CCLE data as a use case for our feature selection methods. Here, we took the cell line classifications from Iorio et al. 2016 into resistant/sensitive for a few drugs, and we're trying to predict them using gene expression data. For now we're stratifying CV folds by cancer type (so train/test sets have equal representation of the same cancer types), which should be the "easy" case compared to holding out entire cancer types.

In general, we don't get great performance for the six drugs we're looking for, with many AUPR values near 0 or only slightly better:

image

These are pretty similar to the AUPR values reported in https://arxiv.org/abs/2208.14822 (see Table 5: they're using multi-omics data so their results are slightly better than ours, but not by too much). So this is a slightly harder problem than we expected, and honestly we don't see much separation between feature selection methods with most of them performing comparably to random features.

We'll have to think about whether this is the right problem, or if regression on continuous drug response values (e.g. IC50 values) would be a better way to go. Our labels are pretty imbalanced here (the vast majority of cell lines are resistant to the vast majority of drugs) so that could be one issue.

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB