SGD learning rate tuning + pytorch implementation, simulations

Sorry in advance - this is kind of a large PR with no super obvious way to split it up into smaller ones. I'll summarize some of the main changes I'm making in this PR below, and if you want to review each of those individually feel free to do that and just approve once you finish them all.

Main changes:

Tune learning rate for SGD optimizer: see changes to 01_stratified_classification/run_stratified_lasso_penalty.py, 01_stratified_classification/scripts/run_lasso_lr_compare.sh, parts of pancancer_evaluation/prediction/classification.py and pancancer_evaluation/utilities/classify_utilities.py. This turns out to matter quite a bit for SGD performing well, and once we use a slightly more sophisticated approach for tuning the learning rate (constant learning rate + a grid search in this case) we get much better performance, on par with liblinear.
Try a pytorch implementation of SGD: we did this primarily to make sure the SGD performance/regularization dynamics weren't specific to the sklearn implementation. These changes are in 01_stratified_classification/run_stratified_nn.py and pancancer_evaluation/prediction/classification.py (the train_mlp_lr function primarily). We probably won't end up using these results for much in the paper, but it was a useful sanity check.
Try SGD and liblinear on some simulated data: these are the 01_stratified_classification/sgd_params/sim.ipynb and 01_stratified_classification/sgd_params/sim_lr.ipynb notebooks. I used these to iterate quickly on the learning rate changes and compare our results to L2 regularization, but the results turn out to be a bit different on real data so I'm not sure how applicable this simulation approach is to the problem we're trying to address in our paper. I want to keep these scripts around for future reference, though.

I have a summary of the main plots/conclusions in these slides, in case they help a bit with putting the results in context: https://docs.google.com/presentation/d/1LRBq_ciFeS503J8-GeH51l-1p4RTdWJPn_LcGhHNjgM/edit?usp=sharing. Let me know if you have questions!

greenelab / pancancer-evaluation

SGD learning rate tuning + pytorch implementation, simulations #81