Refactor data loading code to load data once for multiple experiments

Closes #7. Changes in this PR:

Moved mutation prediction script (previously classify_cancer_type.py) to a class, allowing data to be loaded once and kept in memory for all subsequent experiments.
Get rid of subprocess calls in 01_run_pancancer_classification.py, changing it to initialize the class once and preprocess data only as needed for each experiment.
Add debugging option to only load a small subset of the data, add option to load data from Vogelstein paper (see data_utilities.py), and some other small changes

This refactor will allow me to run multiple experiments much faster, since I don't have to reload data and make subprocess calls each time. Sorry for the fairly large PR, no rush to get it reviewed.

Historically I haven't done too much OOP (my view is basically this tweet) so any feedback on how I could structure this better or make it more object-oriented would be awesome.

greenelab / pancancer-evaluation

Refactor data loading code to load data once for multiple experiments #14