In #68, in addition to what I listed in the PR description, I also tried running MSI prediction with a different sklearn interface/optimizer. Generally we've been running most experiments using SGDClassifier, which optimizes the logistic loss using stochastic gradient descent. Instead I tried using LogisticRegression with an L1 penalty using the liblinear optimizer, which uses a coordinate descent algorithm that's supposed to converge quickly but can scale worse to datasets with many samples.
Since performance was generally better for MSI prediction with LogisticRegression, but not that much better, in this PR I reran the mutation prediction experiments from #65 using LogisticRegression, and compared the results between the two optimizers in the notebooks 02_cancer_type_classification/lasso_range_analysis/compare_optimizers_all.ipynb and 02_cancer_type_classification/lasso_range_analysis/compare_optimizers_gene.ipynb.
In general, it does seem like the liblinear optimizer results in a better fit for almost every gene:
In this plot, each sample is a gene/cancer type combination, and a positive value means liblinear performed better than sgd for the best-performing LASSO parameters using each optimizer.
In #68, in addition to what I listed in the PR description, I also tried running MSI prediction with a different sklearn interface/optimizer. Generally we've been running most experiments using SGDClassifier, which optimizes the logistic loss using stochastic gradient descent. Instead I tried using LogisticRegression with an L1 penalty using the
liblinear
optimizer, which uses a coordinate descent algorithm that's supposed to converge quickly but can scale worse to datasets with many samples.Since performance was generally better for MSI prediction with
LogisticRegression
, but not that much better, in this PR I reran the mutation prediction experiments from #65 usingLogisticRegression
, and compared the results between the two optimizers in the notebooks02_cancer_type_classification/lasso_range_analysis/compare_optimizers_all.ipynb
and02_cancer_type_classification/lasso_range_analysis/compare_optimizers_gene.ipynb
.In general, it does seem like the
liblinear
optimizer results in a better fit for almost every gene:In this plot, each sample is a gene/cancer type combination, and a positive value means
liblinear
performed better thansgd
for the best-performing LASSO parameters using each optimizer.