As a reminder of what I've been doing previously, I'm trying to predict cancer mutations from gene expression in individual cancer types, either training my models on data from the same cancer type or the same cancer type + all other cancers in TCGA.
As a sanity check, we wanted to try an experiment where we remove the relevant cancer type from the pan-cancer data (i.e. train on all other cancers besides X and test on X). We assume that this will always perform worse than a training set with the relevant cancer type (i.e. our pan-cancer training set from before).
Thankfully, this does seem to be the case (see the second volcano plot in 06_plot_remove_cancer_type.ipynb, copied here: negative x-axis values = better performance with relevant cancer type).
As a reminder of what I've been doing previously, I'm trying to predict cancer mutations from gene expression in individual cancer types, either training my models on data from the same cancer type or the same cancer type + all other cancers in TCGA.
As a sanity check, we wanted to try an experiment where we remove the relevant cancer type from the pan-cancer data (i.e. train on all other cancers besides X and test on X). We assume that this will always perform worse than a training set with the relevant cancer type (i.e. our pan-cancer training set from before).
Thankfully, this does seem to be the case (see the second volcano plot in
06_plot_remove_cancer_type.ipynb
, copied here: negative x-axis values = better performance with relevant cancer type).