Closed zegzag closed 5 years ago
@zegzag good suggestion, thank you for this detailed feedback! This is super valuable for us as we work on the v0.9 refactor coming this summer. I'll leave this open and ping again once we release that, to see if we can address this great feedback in v0.9
Hi @zegzag — thanks again for the suggestion!! You might notice that in v0.9, we've changed the convention such that abstains are -1 labels. 🙌 Exactly to your point, we found that users were confusing categorical labels with our 0 convention.
As you play around with the repo, please don't hesitate to open additional issues, start discussions on our forum, etc. — we really appreciate the feedback!
I find that the '0' may have special means in snorkel, which indicates the abstention of LFs. So I need to encode my categories beginning with 1 in snorkel. But in many case, '0' is used as one of the encoded categories. And since the index of
GenerativeModel.marginals()
, which indicate the encoded categories, begins from 0, things can be quite confusing.For example: in the following pipeline
L
matrix from LFs. L.shape=(1000, 10), L.unique()=[0,1,2,3,4]. This means 10 LFs for 4 category classification and '0' denotes the abstention of LFs.gen_model=GenerativeModel()
gen_model.train(L)
Y_marginal=gen_model.marginal()
. ButY_marginal.shape=(1000, 4)
instead of(1000, 5)
Y_label=np.argmax(Y_marginal, axis=1) +1
So, I suggest that there can be some explanations of this phenomenon in snorkel documentation and tutorials.