greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
58 stars 17 forks source link

Updated gen model #94

Closed danich1 closed 4 years ago

danich1 commented 4 years ago

This PR follows the previous pr (#93). I took the time to carefully examine my code and notice a few bugs concerning the generative model performance:

  1. The grid search algorithm wasn't selecting the best model. It was selecting a model at random. Fixed that and then I realized I was using the wrong column to estimate model performance.
  2. The indices for the GiG and possibly CbG relation had an added label function that shouldn't be in there.

After fixing the above bugs I have corrected the gen model's auroc and aupr point plot grid:

Before Bug Fix After Bug Fix
tune set AUROC tune set AUROC
tune set AUPR tune set AUPR
test set AUROC test set AUROC
tune set AUPR test set AUPR

Lastly, this means that the spike I was observing within the GiG and CbG relations was just a bug.. Anyway for this review take a look at the updated figures and notebooks and the train_model_helpher.py file. The other files have quick one liner changes.

After this PR, there will be another one that just contains the updated data files. Didn't want to risk this PR getting too large.

cgreene commented 4 years ago

Some of those figures listed above seem to have 404s.

danich1 commented 4 years ago

Ah Fixed. There was a missing c in the file name.