snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Working of generative model #871

Closed SrikarNamburu closed 6 years ago

SrikarNamburu commented 6 years ago

Can anyone help me to understand how the relative accuracies of the labelling functions are being calculated? @ajratner @jason-fries

ajratner commented 6 years ago

Hi @srikar2605 - we use a generative model to learn these accuracies without any training data. Intuitively, we are looking at the matrix of their agreements and disagreements on unlabeled data, and learning the accuracies which make it most likely. See the data programming paper (https://arxiv.org/abs/1605.07723) and the data programming blog post (http://hazyresearch.github.io/snorkel/blog/weak_supervision.html) as well as other material on snorkel.stanford.edu. Hope this helps!

SrikarNamburu commented 6 years ago

Hello Alex, Thanks for the reply, I have another doubt. The candidates that we extract from the documents, are they unique?

On Thu, Feb 15, 2018 at 10:07 PM, Alex Ratner notifications@github.com wrote:

Hi @srikar2605 https://github.com/srikar2605 - we use a generative model to learn these accuracies without any training data. Intuitively, we are looking at the matrix of their agreements and disagreements on unlabeled data, and learning the accuracies which make it most likely. See the data programming paper (https://arxiv.org/abs/1605.07723) and the data programming blog post (http://hazyresearch.github. io/snorkel/blog/weak_supervision.html) as well as other material on snorkel.stanford.edu. Hope this helps!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HazyResearch/snorkel/issues/871#issuecomment-365984633, or mute the thread https://github.com/notifications/unsubscribe-auth/ANUgVpOK1OufrcJ7LOY005ODu3_oOC9wks5tVF0sgaJpZM4SGYIk .

stephenbach commented 6 years ago

Yes, duplicates will be removed, so there will only be candidate referring to e.g., a particular tuple of spans of text.