Philosophical question: LFs as features --> perfect classifier

rasmusbergpalm commented 6 years ago

Hi. I have a question of a more philosophical nature.

Assume I have an unlabelled dataset and I create a set of LFs and use the LFs to probabilistically label the dataset and use your noise aware loss to train a classifier to output the noisy training marginals. i.e. I use snorkel as intended.

The "best" classifier (one with lowest expected loss on test set) I can ever train, is one that perfectly recovers the testing marginals.

This classifier already exists though. It's the one you used to generate the marginals based on the LFs. I can compute the LFs on unlabelled data, so there's nothing to stop me from using them as "features" on new data, and thus get a "perfect" classifier, without even training one.

Now you might argue that there could be more information in other features (aside from the LFs) that I could use to classify the inputs "better", but it's an assumption of your model that "Second, given an example (x, y) ∼ π∗ , the class label y must be independent of the features f(x) given the labels λ(x)". i.e. there's no additional information about the label, y, to be gained from the input, x, given you already know the λ(x).

Essentially, since we use a (probabilistic) classifier to generate the labels, we can never get a better classifier than that one. In this case, the LFs reduces to features, and snorkel reduces to a way to create a classifier from those features.

Am I misunderstanding something?

bartgras commented 5 years ago

I think the example from https://arxiv.org/abs/1711.10160 section 2.5 explains it very well:

The CDR data contains the sentence, “Myasthenia gravis presenting as weakness after magnesium administration.” None of the 33 labeling functions we developed vote on the corresponding Causes(magnesium, myasthenia gravis) candidate, i.e., they all abstain. However, a deep neural network trained on probabilistic training labels from Snorkel correctly identifies it as a true mention.

In other words, trained LSTM-based predictive model allows you to generalize beyond labeling functions so that LSTM's learned "language model" makes correct predictions on pattern like X presenting as weakness after Y administration

ajratner commented 5 years ago

@rasmusbergpalm Great question and @bartgras thanks for the great response!

For more, can see #1059

And to tack on a bit to that answer- one explanation functionally is that you regularize the end model, which encourages it to spread feature mass out beyond just the features most correlated with the LF labels.

snorkel-team / snorkel

Philosophical question: LFs as features --> perfect classifier #946