Closed danich1 closed 6 years ago
Let's update the model to use only a single LSTM feature. It looks like the mean or 80th percentile would be the best choice. Let's see if this helps fix the issue where the model is worse than a single feature.
Regarding the correlation plot, usually you'd want to use a diverging colormap because correlations go from -1 to 1. However, since there aren't many negative correlations here, viridis is fine.
We should be standardizing the features using an sklearn pipeline. Cognoma notebook has an example, although it's actually its more complex than what we need.
Additional references:
https://github.com/greenelab/snorkeling/pull/36/commits/40ca5bd55660f068ab97408b8858137d94d9919d logit not log. Time for a scipy special. Logit enables logistic regression to regurgitate the prior probability ad infinitum
This PR is a mix of bug fixing and adding data files onto the repo. I fixed the naming convention on the data file, so hopefully they make more sense. Furthermore, I calculated AUROCs for each feature which is located in notebook 9. Let me know what you think.