greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
59 stars 17 forks source link

Updated Entity Level Prediction Module #36

Closed danich1 closed 6 years ago

danich1 commented 6 years ago

This PR is a mix of bug fixing and adding data files onto the repo. I fixed the naming convention on the data file, so hopefully they make more sense. Furthermore, I calculated AUROCs for each feature which is located in notebook 9. Let me know what you think.

dhimmel commented 6 years ago

Let's update the model to use only a single LSTM feature. It looks like the mean or 80th percentile would be the best choice. Let's see if this helps fix the issue where the model is worse than a single feature.

Regarding the correlation plot, usually you'd want to use a diverging colormap because correlations go from -1 to 1. However, since there aren't many negative correlations here, viridis is fine.

dhimmel commented 6 years ago

We should be standardizing the features using an sklearn pipeline. Cognoma notebook has an example, although it's actually its more complex than what we need.

Additional references:

dhimmel commented 6 years ago

https://github.com/greenelab/snorkeling/pull/36/commits/40ca5bd55660f068ab97408b8858137d94d9919d logit not log. Time for a scipy special. Logit enables logistic regression to regurgitate the prior probability ad infinitum