Working disc model - Githubissues

greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊

Other

59 stars 17 forks source link

Working disc model #97

Closed danich1 closed 4 years ago

danich1 commented 4 years ago

This pr contains results from the discriminator model for relations: Disease associates Gene, Compound binds Gene and Compound Treats Disease. Lots of files but most of them are data files. I also started to clean out some old outdated files, but realize this PR would explode in number of files changed. Decided to leave that whole task for a future PR.

To speed up the process here are the notebooks you need to review:

compound_disease/compound_treats_disease/disc_model_experiment/disc_model_analysis.ipynb
compound_gene/compound_binds_gene/disc_model_experiment/disc_model_analysis.ipynb
disease_gene/disease_associates_gene/disc_model_experiment/disc_model_analysis.ipynb
disease_gene/disease_associates_gene/disc_model_experiment/disc_model_off_label_analysis.ipynb

danich1 commented 4 years ago

Generative model AUROC on test data is 0.55 so the classification task still remains challenging.

@dhimmel I believe you are looking at the off label notebook. That notebook was a sanity check, given the updated conclusions about the generative model. Since adding off target label functions (using CtD label functions to predict DaG sentences) doesn't substantially help performance, we expect the discriminator model to have hindered performance when training on the off-target generative model. As expected the disc model performs worse than the generative model.

off_label_test

The plots below shows performance when training the discriminator model on an on-target generative model (using DaG label functions to predict DaG sentences).

on_label_tune_set

on_label_test_set

Side Note: playing with plot libraries ignore the style difference.

danich1 commented 4 years ago

Just to clarify the workflow. So your generator will take your sentences annotate them with label functions. So a sentence X will have 3 features that correspond to 3 label function (i.e. the word "associates" is present). Then your discriminator uses that matrix of sentences x label function features to predict if that sentences has an association?

If that's the case I'm not sure how you're getting predictions with the generative model?

@ajlee21 you are close in with the workflow description. The matrix of label features gets inserted into the generative model. This model consolidates the label features and outputs a single label for each sentence. So in your example: sentence X starts off with 3 LF features -> generative model -> sentence X now has 1 label. ^ That's how I get predictions from the generative model.

From there I feed the discriminator model with sentence X and use the generative model's predicted label as sentence X's training class.