snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Understanding LabelModel and cardinality #1591

Closed paulachocron closed 3 years ago

paulachocron commented 4 years ago

I'm seeing some strange predictions on my data. I have labels 0 and 1. I defined 5 labeling functions which sometimes abstain.

When I train a LabelModel with cardinality=2 (the real value), it predicts always 0. I see points where 4 lfs return 1, one abstains, and the predictions is 0. Even points where all lfs return 1, yet the model predicts 0. When I set cardinality=3 results seem much more reasonable. But it's not the real value!

Any thoughts on what could be happening?

This is my LFAnalysis(L_train).lf_summary()

   Polarity Coverage Overlaps Conflicts
0 [0, 1] 0.708946 0.708902 0.501723
1 [0, 1] 0.884765 0.884285 0.615824
2 [0, 1] 0.872596 0.871811 0.601038
3 [0, 1] 0.830985 0.830811 0.602303
4 [0, 1] 0.450735 0.450735 0.355084
paroma commented 4 years ago

You can try changing parameters like class_balance and regularization to see whether that gives better results (see this tutorial and documentation for examples).

We can also help debug further here if we have access to the L matrix for this data.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.