greenelab / pancancer-evaluation

Evaluating genome-wide prediction of driver mutations using pan-cancer data
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Remove relevant cancer type experiments + analysis #30

Closed jjc2718 closed 3 years ago

jjc2718 commented 3 years ago

As a reminder of what I've been doing previously, I'm trying to predict cancer mutations from gene expression in individual cancer types, either training my models on data from the same cancer type or the same cancer type + all other cancers in TCGA.

As a sanity check, we wanted to try an experiment where we remove the relevant cancer type from the pan-cancer data (i.e. train on all other cancers besides X and test on X). We assume that this will always perform worse than a training set with the relevant cancer type (i.e. our pan-cancer training set from before).

Thankfully, this does seem to be the case (see the second volcano plot in 06_plot_remove_cancer_type.ipynb, copied here: negative x-axis values = better performance with relevant cancer type).

image