Initial Classification Pipeline

Played around with a ton of different things:

Filtering metrics (dropping columns if 90% were NA or ZEROS)
Running XGBOOST on a dataset at a time on all features, and also on selected features
Plotting and computing feature importances from XGBOOST
Plotting heatmaps to view correlations between features
Running PCA on features (per dataset)

This work flow is also viewable for one example (one dataset) in notes/progress-16-04-24

Future work

Following an impromptu meeting with roberta, we may try some other stuff. Regardless of the exact direction, the classification pipeline needs to be streamlined as a lot of the code was done in a very short amount of time.

rbroc / echo

Preliminary Classification Pipeline #59

Initial Classification Pipeline

Future work