Closed SrikarNamburu closed 6 years ago
Hi @srikar2605 - we use a generative model to learn these accuracies without any training data. Intuitively, we are looking at the matrix of their agreements and disagreements on unlabeled data, and learning the accuracies which make it most likely. See the data programming paper (https://arxiv.org/abs/1605.07723) and the data programming blog post (http://hazyresearch.github.io/snorkel/blog/weak_supervision.html) as well as other material on snorkel.stanford.edu. Hope this helps!
Hello Alex, Thanks for the reply, I have another doubt. The candidates that we extract from the documents, are they unique?
On Thu, Feb 15, 2018 at 10:07 PM, Alex Ratner notifications@github.com wrote:
Hi @srikar2605 https://github.com/srikar2605 - we use a generative model to learn these accuracies without any training data. Intuitively, we are looking at the matrix of their agreements and disagreements on unlabeled data, and learning the accuracies which make it most likely. See the data programming paper (https://arxiv.org/abs/1605.07723) and the data programming blog post (http://hazyresearch.github. io/snorkel/blog/weak_supervision.html) as well as other material on snorkel.stanford.edu. Hope this helps!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HazyResearch/snorkel/issues/871#issuecomment-365984633, or mute the thread https://github.com/notifications/unsubscribe-auth/ANUgVpOK1OufrcJ7LOY005ODu3_oOC9wks5tVF0sgaJpZM4SGYIk .
Yes, duplicates will be removed, so there will only be candidate referring to e.g., a particular tuple of spans of text.
Can anyone help me to understand how the relative accuracies of the labelling functions are being calculated? @ajratner @jason-fries