snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.79k stars 858 forks source link

Multiclass Classification Example/Tutorial/Advice #604

Closed thammegowda closed 7 years ago

thammegowda commented 7 years ago

I am trying to build a multiclass classifier.

Should I fall back to one-vs-rest approach or is it somehow possible to extend the GenerativeModel to multiclass classification? Can you share some examples/pointers/advice if you already have?

Maybe this one will give you the right context: How to extend the intro tutorial of Spouse relationship classification to do three class: Spouse, Sibling, or None of these. or maybe four class Spouse, Sibling, Parent_Child, None

ajratner commented 7 years ago

Hi @thammegowda ,

At the current moment the one-vs-rest approach is the only one we have support for. This is definitely something that we've been thinking about however and is something we want to build in soon!

One potential route in the meantime is that I do know that the simple NaiveBayes generative model (https://github.com/HazyResearch/snorkel/blob/master/snorkel/learning/gen_learning.py#L13) which works for the independent-LFs setting is pretty easy to extend to the multinomial setting (@jason-fries do you know where the code we wrote for this is?). However we don't currently have the rest of the Snorkel pipeline set up for multinomial... but this is all hackable.

We will try to tackle this at some point soon though; happy to also talk about it more offline, re: your specific use case

Thanks, Alex

thammegowda commented 7 years ago

@ajratner Thanks for the reply.

So to summarize, we have two ways here:

  1. write wrapper to do one-vs-rest. I am concerned ( my past experiments with other classifiers) that there will be a drop in accuracy. In my case, say I have ClassA, ClassB and Other. ClassA and ClassB have very subtle differences compared to Other. So one-vs-rest have difficulty distinguishing ClassA separate from ClassB and Other combined (Some of ClassBs goes to ClassA. However it correctly classifies if I consider all classes together with the exact same features. Maybe I have to revisit featureset for this case.
  2. Use Naive Bayes for generative model with independent LFs and possibly write multinomial logistic regression for the discriminative model. Here we need some research/ brainstorming on dealing with noisy labels for multinomial logistic regression.

I will revisit these options (option2 sounds like challenging and interesting one since it will be reusable in other problems and I dont need to invent features that just solve one problem).

Let's discuss offline. I will describe my use case in an email.

ajratner commented 7 years ago

Hi @thammegowda ,

Ignore the two options I listed above! Discussed with the team and this shouldn't be that bad to integrate into the current generative model. Basically the LF labels are already stored as ints (in ${-1,0,1}$) so we just need to switch to ints in a bigger range to handle multinomial, and everything should work just the same (except for one or two possible tweaks to some of the LF dependencies, but these can be ignored at first pass).

Then of course we'd need to put in multinomial support in the rest of the pipeline--i.e. modify the candidate classes, etc.--but this shouldn't be fundamentally too hard either, and can be hacked around in the short term.

We are now planning to get to this sometime at the end of this month or early next month. However, if you feel like hacking on it and submitting a PR, feel free-should be an interesting and fun bit of coding!

-Alex

thammegowda commented 7 years ago

Hi @ajratner Thanks 👍

ajratner commented 7 years ago

Hi @thammegowda - if still interested, check out dev branch, in particular tutorials/intro/Intro_Tutorial_Categorical.ipynb for an implementation of categorical Candidates (can also see the relevant PR, #646 , if interested). Hope this is useful!

Will be merged into master soon with next release (probably next week)

Closing this issue for now but feel free to reopen if any questions!

thammegowda commented 7 years ago

@ajratner That is so awesome! :1st_place_medal: Thanks for bringing this feature and notifying ... I will check it out

ajratner commented 7 years ago

Let us know if any feedback! More updates here coming soon too!

PiotrCzapla commented 5 years ago

For anyone looking for Intro_Tutorial_Categorical.ipynb and the dev branch. It seems that it was merged to 'master' and the example was moved to "advance" tutorials: https://github.com/HazyResearch/snorkel/blob/master/tutorials/advanced/Categorical_Classes.ipynb

rjurney commented 4 years ago

@ajratner Does a version of this categorical class example exist somewhere?

phucdev commented 4 years ago

@ajratner Does a version of this categorical class example exist somewhere?

https://github.com/snorkel-team/snorkel-extraction/blob/master/tutorials/advanced/Categorical_Classes.ipynb

rjurney commented 4 years ago

@phucdev thanks!