Closed pratikchhapolika closed 3 years ago
Hi Pratik,
Your understanding is correct for 1. and 2.
For 3, the output of the label model is not about each labeling function. The label model produces a probabilistic estimate of the true label. So, for example, for a dataset with 3 points and binary class, the output might be [[0.6, 0.4], [0.3, 0.7], [1.0, 0.0]].
For 4, take a look at the notebook that the tutorial is referring to: https://github.com/snorkel-team/snorkel-tutorials/blob/master/spam/01_spam_tutorial.ipynb In [36], you’ll find the definition of probs_train
For 5, your understanding is correct. Note that it depends on your class size. If the cardinality of your task is binary, then, yes, you will get a binary output {0,1}.
For 6, you can find these here: Label Model: https://github.com/snorkel-team/snorkel/blob/master/snorkel/labeling/model/label_model.py
probs_to_preds method: https://github.com/snorkel-team/snorkel/blob/88b2579b8a4b22a6132f2e940a8a47949c73f9b8/snorkel/utils/core.py#L13
PandasLFApplier: https://github.com/snorkel-team/snorkel/blob/master/snorkel/labeling/apply/pandas.py#L51
Hope this helps!
Closing this for now, feel free to reopen if you have any other questions!
@here, hello I went through this explanations was useful, thanks, but it did not cover the issue that I have. I am now want to know how Label Model works for multi-class instead of binary class? Since in my problem I have 7 classes with 13 Label Functions when I want to apply "fit"method, it gives me the error as:
"ValueError: L_train has cardinality 12, cardinality=7 passed in." which I believe is related to line 890-894 of the link: https://github.com/snorkel-team/snorkel/blob/master/snorkel/labeling/model/label_model.py
Based on your explanations and documentation, cardinality shows the number of classes but when I have different number of classes of LFs it gave me an error. My Snorkel version is : 0.9.8 and I used pip for its installation on Mac. Is there any enriched doc for multi-class labeling using Snorkel?
Thanks in-advance,
Hi betiTG, Snorkel works for multi-class problems just as well as binary. This error message suggests that you initialized the label model for a problem with 7 classes however the LF outputs you were passing in contained 12. If you have LFs that can vote on 12 possible classes you need to make sure the label model is initialized with cardinality=12 rather than 7. Hope this helps!
Best, Humza
I guess your classes have not names 1...7 and probably you named them in a way that the last class named 13. If it is the case just use a mapping dictionary.
I was going throught this document <https://www.snorkel.org/use-cases/01-spam-tutorial> and everything was good untill I cam to section 4. Combining Labeling Function Outputs with the Label Model and 5. Training a Classifier
I have the following doubts, please let me know if my understanding is correct?
The input to LabelModel is a matrix which is of dimension *total_training_samples labeling_fn_output** ?
The output of LabelModel model is matrix of probabilites having dimension same as input: *total_training_samples labeling_fn_prob**?
For every training sample it gives probability of each labeling function ? Then how would we know that the probabilities that we get for each labeling function for every data point belongs to which class?
In the above code probs_to_preds takes the maximum of the probabailities across the row? What would be final values of preds_train_filtered. Will it be an array of {0,1} ?
Where could I see the implementation of
LabelModel
,probs_to_preds
,PandasLFApplier
?