One of our reviewers asked us to try a neural network model for the multi-omics mutation prediction problem, to see if a more complex model can pick up on interactions between different -omics layers.
This PR is a bit of a "pilot" experiment in the sense that I'm fitting pretty small models (1000 PCs for each -omics type, fairly small hidden layers, only 4 different genes) in order to get things to train quickly. The idea was that if we struggled with getting this to outperform elastic net, it might not be worth the effort/computational resources of fitting larger models on higher input dimensionality, etc.
However, it seems like the neural network tends to give a fairly clear performance improvement over elastic net also using 1000 PCs, particularly in the multi-omics case:
So, we'll continue to pursue this, to make a comparison on the same data/genes as we have in the paper.
Code in mpmp/prediction/cross_validation.py to write results of parameter search to log file
Parameter ranges in mpmp/config.py
Shell script in 05_classify_mutations_multimodal/scripts/test_mlp.sh (this will eventually be adapted/extended to work with whatever cluster we end up using)
Analysis of results/comparison against elastic net in 05_classify_mutations_multimodal/compare_mlp.ipynb
One of our reviewers asked us to try a neural network model for the multi-omics mutation prediction problem, to see if a more complex model can pick up on interactions between different -omics layers.
This PR is a bit of a "pilot" experiment in the sense that I'm fitting pretty small models (1000 PCs for each -omics type, fairly small hidden layers, only 4 different genes) in order to get things to train quickly. The idea was that if we struggled with getting this to outperform elastic net, it might not be worth the effort/computational resources of fitting larger models on higher input dimensionality, etc.
However, it seems like the neural network tends to give a fairly clear performance improvement over elastic net also using 1000 PCs, particularly in the multi-omics case:
So, we'll continue to pursue this, to make a comparison on the same data/genes as we have in the paper.
Code changes:
mpmp/prediction/classification.py
andmpmp/prediction/nn_models.py
, using pytorch and theskorch
package for sklearn compatibilitympmp/prediction/cross_validation.py
to write results of parameter search to log filempmp/config.py
05_classify_mutations_multimodal/scripts/test_mlp.sh
(this will eventually be adapted/extended to work with whatever cluster we end up using)05_classify_mutations_multimodal/compare_mlp.ipynb