Moved mutation prediction script (previously classify_cancer_type.py) to a class, allowing data to be loaded once and kept in memory for all subsequent experiments.
Get rid of subprocess calls in 01_run_pancancer_classification.py, changing it to initialize the class once and preprocess data only as needed for each experiment.
Add debugging option to only load a small subset of the data, add option to load data from Vogelstein paper (see data_utilities.py), and some other small changes
This refactor will allow me to run multiple experiments much faster, since I don't have to reload data and make subprocess calls each time. Sorry for the fairly large PR, no rush to get it reviewed.
Historically I haven't done too much OOP (my view is basically this tweet) so any feedback on how I could structure this better or make it more object-oriented would be awesome.
Closes #7. Changes in this PR:
classify_cancer_type.py
) to a class, allowing data to be loaded once and kept in memory for all subsequent experiments.subprocess
calls in01_run_pancancer_classification.py
, changing it to initialize the class once and preprocess data only as needed for each experiment.data_utilities.py
), and some other small changesThis refactor will allow me to run multiple experiments much faster, since I don't have to reload data and make
subprocess
calls each time. Sorry for the fairly large PR, no rush to get it reviewed.Historically I haven't done too much OOP (my view is basically this tweet) so any feedback on how I could structure this better or make it more object-oriented would be awesome.