This PR expands #65 to all the genes in the Vogelstein cancer gene set, and modifies the analyses/visualizations from the last PR a bit to handle lots more genes.
In general, we see that for most genes, there are either positive or ~0 correlations between the number of features and generalization performance across cancer types, suggesting that if anything, models that include more features tend to generalize better. We're still exploring this and thinking of ways to summarize the results.
Code changes:
Moved LASSO analysis scripts to 02_cancer_type_classification/lasso_range_analysis directory
Modified 02_cancer_type_classification/lasso_range_analysis/lasso_range_all.ipynb to use different correlation methods, and to plot distributions for all genes
Added 02_cancer_type_classification/lasso_range_analysis/lasso_corr_analysis.ipynb to compare correlation methods (i.e. to identify genes with high CCC and low Pearson/Spearman, or vice-versa)
02_cancer_type_classification/lasso_range_analysis/lasso_range_gene.ipynb doesn't really need to be reviewed, it's mostly the same as the previous script that was moved.
This PR expands #65 to all the genes in the Vogelstein cancer gene set, and modifies the analyses/visualizations from the last PR a bit to handle lots more genes.
In general, we see that for most genes, there are either positive or ~0 correlations between the number of features and generalization performance across cancer types, suggesting that if anything, models that include more features tend to generalize better. We're still exploring this and thinking of ways to summarize the results.
Code changes:
02_cancer_type_classification/lasso_range_analysis
directory02_cancer_type_classification/lasso_range_analysis/lasso_range_all.ipynb
to use different correlation methods, and to plot distributions for all genes02_cancer_type_classification/lasso_range_analysis/lasso_corr_analysis.ipynb
to compare correlation methods (i.e. to identify genes with high CCC and low Pearson/Spearman, or vice-versa)02_cancer_type_classification/lasso_range_analysis/lasso_range_gene.ipynb
doesn't really need to be reviewed, it's mostly the same as the previous script that was moved.