snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.79k stars 858 forks source link

Error analysis (e.g., lf_stats) doesn't work for multiclass feature using default values #1021

Closed debnil closed 5 years ago

debnil commented 6 years ago

I've set up a multiclass feature of cardinality = 5 and values = None. My understanding is that this should create values = [0, 1, 2, 3, 4]. I then write labeling functions for each class; the output of these labeling functions is one of [None, 0, 1, 2, 3, 4]. None indicates abstention, and other values represent assignments to one of the classes. So far, this seems like the default multiclass setup.

When attempting to evaluate model performance using lf_stats, I run into an issue: each of the functions called wraps the label matrix in sparse_nonzero. This means that class 0 receives 0 for coverage, overlap, and conflict.

I understand that in a binary setting, 0 represents abstention; however, in multiclass, the recommended abstention is None. I also don't see these Nones filtered out in computing coverage, overlap, and conflict.

Is there a better way to debug multiclass to account for this? Or am I just totally misunderstanding how multiclass labels should be assigned by the labeling functions?

ajratner commented 5 years ago

This is most likely a sub-function that needs to be extended for multi-class?

bhancock8 commented 5 years ago

In the new Snorkel v0.9 that was released today, this shouldn't be an issue. We now use the convention that -1 represents abstention (to avoid conflict with the many tools and conventions that expect to use 0 as a class label). You can see an example of a multiclass problem with labeling functions in the visual relationship detection tutorial (https://www.snorkel.org/use-cases/visual-relation-tutorial). You may want to start with the "Getting Started" tutorial first though (https://www.snorkel.org/get-started/) for a introduction to the new interface and classes!