Mutation prediction outstanding issues/updates

[x] use_pancancer should be false for all_other_cancers in process_data_for_gene() (i.e. there shouldn't be unused dummy/one-hot variables)
[ ] Label filtering should happen after we take the intersection of samples between gene expression and mutation (this will make the proportions in 08_cell_line_prediction/download_data.ipynb match what we actually see when the scripts run)
[ ] tcga_utilities should probably be renamed to something more general, or split
[ ] CNV data for cell lines, in ccle_data_model _generate_labels()
[x] remove unknown/non-cancerous samples in load_sample_info()
[x] maybe try sklearn LogisticRegression with elastic net penalty rather than SGDClassifier
[ ] save label proportions to plot AUPR baseline: https://github.com/greenelab/pancancer-evaluation/pull/56#discussion_r981576511

greenelab / pancancer-evaluation