greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
59 stars 17 forks source link

Word vector fix #96

Closed danich1 closed 4 years ago

danich1 commented 4 years ago

This PR follows #94 and #95. After the generative model was fixed, the next step is to fix the issue contained in the word vectors and sentence embeddings. Long story short I had an index error when trying to load sentences for the discriminator model.

e.g. Sentence: Over-expression in disease
0 - represent the word over-expression, 1 - represents the word disease and 2- represents the word in

This is problematic because 0 and 1 are reserved for null character and unknown token character. This means two important words were getting drowned out for null's and unknowns, which lead to poor discriminative model performance.

Issue is fixed and now the disc model is working. Results will appear in next PR.