Closed debnil closed 5 years ago
This is most likely a sub-function that needs to be extended for multi-class?
In the new Snorkel v0.9 that was released today, this shouldn't be an issue. We now use the convention that -1 represents abstention (to avoid conflict with the many tools and conventions that expect to use 0 as a class label). You can see an example of a multiclass problem with labeling functions in the visual relationship detection tutorial (https://www.snorkel.org/use-cases/visual-relation-tutorial). You may want to start with the "Getting Started" tutorial first though (https://www.snorkel.org/get-started/) for a introduction to the new interface and classes!
I've set up a multiclass feature of
cardinality = 5
andvalues = None
. My understanding is that this should createvalues = [0, 1, 2, 3, 4]
. I then write labeling functions for each class; the output of these labeling functions is one of[None, 0, 1, 2, 3, 4]
.None
indicates abstention, and other values represent assignments to one of the classes. So far, this seems like the default multiclass setup.When attempting to evaluate model performance using
lf_stats
, I run into an issue: each of the functions called wraps the label matrix insparse_nonzero
. This means that class 0 receives 0 for coverage, overlap, and conflict.I understand that in a binary setting,
0
represents abstention; however, in multiclass, the recommended abstention isNone
. I also don't see theseNones
filtered out in computing coverage, overlap, and conflict.Is there a better way to debug multiclass to account for this? Or am I just totally misunderstanding how multiclass labels should be assigned by the labeling functions?