snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Understanding the equation in Data Programming: Creating Large Training Sets, Quickly. #1614

Closed pratikchhapolika closed 4 years ago

pratikchhapolika commented 4 years ago

I was going through the official paper << https://arxiv.org/pdf/1605.07723.pdf>> and came across this two equations:

ratner_doubt2

Here are my doubts regarding equation 1:

  1. What does equation1 means in equation 1? And what is value of Y, is it just {-1,1} or entire true labels of data in data-set?
  2. The final value of equation 1 will be just a number, right?
  3. How would I read this equation in mathematical way?

I will add more question once these 2 are clarified?

brahmaneya commented 4 years ago

Hi @pratikchhapolika , the term in your first question is an indicator function for the i^{th} labeling function voting the same as the true label Y. i.e. it evaluates to 1 if the labeling function term Y is one of -1 and 1. The final value is indeed a number, the product of all the probabilities.