Closed thammegowda closed 7 years ago
Hi @thammegowda ,
At the current moment the one-vs-rest approach is the only one we have support for. This is definitely something that we've been thinking about however and is something we want to build in soon!
One potential route in the meantime is that I do know that the simple NaiveBayes
generative model (https://github.com/HazyResearch/snorkel/blob/master/snorkel/learning/gen_learning.py#L13) which works for the independent-LFs setting is pretty easy to extend to the multinomial setting (@jason-fries do you know where the code we wrote for this is?). However we don't currently have the rest of the Snorkel pipeline set up for multinomial... but this is all hackable.
We will try to tackle this at some point soon though; happy to also talk about it more offline, re: your specific use case
Thanks, Alex
@ajratner Thanks for the reply.
So to summarize, we have two ways here:
ClassA
, ClassB
and Other
. ClassA
and ClassB
have very subtle differences compared to Other
. So one-vs-rest have difficulty distinguishing ClassA
separate from ClassB
and Other
combined (Some of ClassB
s goes to ClassA
. However it correctly classifies if I consider all classes together with the exact same features. Maybe I have to revisit featureset for this case. I will revisit these options (option2 sounds like challenging and interesting one since it will be reusable in other problems and I dont need to invent features that just solve one problem).
Let's discuss offline. I will describe my use case in an email.
Hi @thammegowda ,
Ignore the two options I listed above! Discussed with the team and this shouldn't be that bad to integrate into the current generative model. Basically the LF labels are already stored as ints (in ${-1,0,1}$) so we just need to switch to ints in a bigger range to handle multinomial, and everything should work just the same (except for one or two possible tweaks to some of the LF dependencies, but these can be ignored at first pass).
Then of course we'd need to put in multinomial support in the rest of the pipeline--i.e. modify the candidate classes, etc.--but this shouldn't be fundamentally too hard either, and can be hacked around in the short term.
We are now planning to get to this sometime at the end of this month or early next month. However, if you feel like hacking on it and submitting a PR, feel free-should be an interesting and fun bit of coding!
-Alex
Hi @ajratner Thanks 👍
Hi @thammegowda - if still interested, check out dev
branch, in particular tutorials/intro/Intro_Tutorial_Categorical.ipynb
for an implementation of categorical Candidates
(can also see the relevant PR, #646 , if interested). Hope this is useful!
Will be merged into master soon with next release (probably next week)
Closing this issue for now but feel free to reopen if any questions!
@ajratner That is so awesome! :1st_place_medal: Thanks for bringing this feature and notifying ... I will check it out
Let us know if any feedback! More updates here coming soon too!
For anyone looking for Intro_Tutorial_Categorical.ipynb and the dev
branch. It seems that it was merged to 'master' and the example was moved to "advance" tutorials: https://github.com/HazyResearch/snorkel/blob/master/tutorials/advanced/Categorical_Classes.ipynb
@ajratner Does a version of this categorical class example exist somewhere?
@ajratner Does a version of this categorical class example exist somewhere?
@phucdev thanks!
I am trying to build a multiclass classifier.
Should I fall back to one-vs-rest approach or is it somehow possible to extend the
GenerativeModel
to multiclass classification? Can you share some examples/pointers/advice if you already have?Maybe this one will give you the right context: How to extend the intro tutorial of
Spouse
relationship classification to do three class:Spouse
,Sibling
, or None of these. or maybe four classSpouse
,Sibling
,Parent_Child
, None