Spam tutorial - accuracy of logreg with label model ?

michal-gh commented 4 years ago

Issue description

Section 5 of the spam tutorial shows example of the use of the Keras classifier with labels generated by the label model (https://www.snorkel.org/use-cases/01-spam-tutorial#keras-classifier-with-probabilistic-labelskeras ). However, the test accuracy of the logreg is 90.0%, while test accuracy of the logreg trained on the dev set only is 89.6%. The accuracy of the Scikit-Learn with Rounded Labels is 92.8%. In addition, accuracy of the label model alone (section 4) is 92.4%.

As the test accuracy of the logreg with the label model is not great, it's difficult to see how it supports the tutorial's conclusion: "We observe an additional boost in accuracy over the LabelModel by multiple points! By using the label model to transfer the domain knowledge encoded in our LFs to the discriminative model, we were able to generalize beyond the noisy labeling heuristics."

Can you please check whether the spam tutorial is correct ?

Code example/repro steps

See the spam tutorial

Expected behavior

I expected a multipoint increase in test accuracy over LabelModel, as described in the tutorial.

System info

N/A - this issue is about the spam tutorial

brahmaneya commented 4 years ago

Hi @michal-gh , thanks for pointing this out! The reason the Label Model accuracy appeared to be higher than the discriminative models is because it's accuracy is evaluated differently; The Label Model can abstain over data points where all labeling functions abstained, and those points are not counted when measuring it's accuracy, while the discriminative models can't abstain. PR #192 will make the comparison clearer by not allowing the LabelModel to abstain (and make a random vote for those data points instead).

michal-gh commented 4 years ago

Hi @brahmaneya, thank you for the explanation and the fix.

The test accuracy of the logreg with the label model still feels wrong, as it is almost the same as the accuracy of the logreg trained with a small dev sample, and is 2.4 points lower than the accuracy of the Scikit's logreg with rounded labels. Intuitively, I'd expect the accuracy of the logreg with the label model to be higher than the accuracy of the last two, as the logreg uses better labels generated by the label model.

The Keras logreg model is fitted with the following code: keras_model.fit( x=X_train, y=probs_train_filtered, validation_data=(X_valid, preds_to_probs(Y_valid, 2)), callbacks=[get_keras_early_stopping()], epochs=50, verbose=0, )

I may be completely wrong, but I wonder why validation_data=(X_valid, preds_to_probs(Y_valid, 2)) uses preds_to_probs. This function always returns [0, 1] or [1 0], so it completely removes labels' confidence values (conditional probabilities) computed by the label model. I think that for categorical crossentropy loss function, the validation data should use predict_proba function, like validation_data=(X_valid, label_model.predict_proba(L_valid)[:,1]

ajratner commented 4 years ago

Hi @michal-gh thanks for following up here and sorry for the long-delayed response! Not 100% addressing what you're bringing up, but you can see a simplified version of the tutorial that we are about to push as well as a discussion thread on Spectrum here: https://spectrum.chat/snorkel/tutorials/spam-tutorial-baseline-models-perform-better-than-model-with-snorkel-labels~11d9a7fc-0d3d-49bb-93ee-a4b1830fce4f

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

snorkel-team / snorkel