Note:04_coefficient_analysis.ipynb doesn't need to be reviewed, I'm just adding back changes that I (somehow) accidentally deleted in a previous PR.
In general, the goal of this PR is to "flip" or hold out a subset of positively labeled samples in a given gene and cancer type, and see if a model trained on the rest of the samples can differentiate the "flipped" samples (false negatives) from the true negatives.
For an idea of how I'm training/testing, I made this extremely high-tech and polished graphic:
(shaded samples have positive labels, the others have negative labels). What this is showing is that I'm removing the positively labeled samples from the test set, since including them as positives would inflate performance (training on test data) and including them as negatives (what we did at first) would artificially deflate performance.
Results are in 03_cross_cancer_classification/plot_flip_labels_results.ipynb.
Note:
04_coefficient_analysis.ipynb
doesn't need to be reviewed, I'm just adding back changes that I (somehow) accidentally deleted in a previous PR.In general, the goal of this PR is to "flip" or hold out a subset of positively labeled samples in a given gene and cancer type, and see if a model trained on the rest of the samples can differentiate the "flipped" samples (false negatives) from the true negatives.
For an idea of how I'm training/testing, I made this extremely high-tech and polished graphic:
(shaded samples have positive labels, the others have negative labels). What this is showing is that I'm removing the positively labeled samples from the test set, since including them as positives would inflate performance (training on test data) and including them as negatives (what we did at first) would artificially deflate performance.
Results are in
03_cross_cancer_classification/plot_flip_labels_results.ipynb
.