snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.79k stars 859 forks source link

Is it possible to run snorkel with a linear classifier as the discriminative model? #951

Closed aindrila-ghosh closed 5 years ago

aindrila-ghosh commented 6 years ago

In both the tutorial examples, after the generative model, neural networks are used as noise-aware discriminative models. Is it possible to use a linear classifier (e.g., logistic regression) as a discriminative model? In that case, what would be the right way to interpret the confidence scores (i.e., the train marginals) generated by the generative model?

ajratner commented 6 years ago

@aindrilabasak Check out https://github.com/HazyResearch/snorkel/blob/master/snorkel/learning/disc_models/logistic_regression.py, and interpretation of the probabilistic training labels is exactly the same!

aindrila-ghosh commented 6 years ago

@ajratner, thank you for the answer. I could run my code with Logistic Regression. So, if I want to run my code with any other classifier such as SVM (i.e., not present in the current snorkel library), what approach should I take? As far as I can understand, the Logistic Regression in the code is not just imported as a package from the sklearn repository, but the algorithm was implemented as a subclass of TFNoiseAwareModel. So, if I want to run any other linear classifier, do I need to implement the algorithm as a subclass of the noise-aware model? Or can I just import the algorithm as a package and run it with the train_candidates and train_marginals as input?

ajratner commented 6 years ago

Both are correct- you can either (a) subclass TFNoiseAwareModel, or create an alternative class like this one for non-tensorflow classifiers (one for PyTorch is coming soon!), or (b) just feed in the candidates and probabilistic training labels (train_marginals) if the training code you’re using supports probabilistic training labels. Note that w (b) you also need to get the candidate into a basic format that works with your classifier (see the models we implemented for an example)

If you do make a new class in (a) feel free to share w a PR :)! On Mon, Jun 18, 2018 at 9:03 AM AndyGhosh notifications@github.com wrote:

@ajratner https://github.com/ajratner, thank you for the answer. I could run my code with Logistic Regression. So, if I want to run my code with any other classifier such as SVM (i.e., not present in the current snorkel library), what approach should I take? As far as I can understand, the Logistic Regression in the code is not just imported as a package from the sklearn repository, but the algorithm was implemented as a subclass of TFNoiseAwareModel. So, if I want to run any other linear classifier, do I need to implement the algorithm as a subclass of the noise-aware model? Or can I just import the algorithm as a package and run it with the train_candidates and train_marginals as input?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/HazyResearch/snorkel/issues/951#issuecomment-398106145, or mute the thread https://github.com/notifications/unsubscribe-auth/ABgw_WghZQkDjPcAmsTScL5XW4SmYayfks5t989egaJpZM4Ulig4 .

gpandey19 commented 6 years ago

@aindrilabasak @ajratner How can we train the Logistic Regression model using the candidates and the train marginals.

It will be helpful,if you can verify the below approach

from snorkel.learning.disc_models.logistic_regression import * disc_model = LogisticRegression() disc_model.train(train_cands, train_marginals, X_dev=dev_cands, Y_dev=L_gold_dev, **train_kwargs)

I'm getting these two errors: 'list' object has no attribute 'shape TypeError: list indices must be integers, not tuple

gpandey19 commented 6 years ago

@ajratner w.r.t (see the models we implemented for an example) Please specify the model where the candidates have been modified as the algorithm used

ajratner commented 5 years ago

Hi @gpandey19 to use logistic regression, you need to precompute a set of features; for an example, you can see this test (for the PyTorch version). Hope this helps!