snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Generative model for multi_class labels #1595

Closed SoniaBadene closed 3 years ago

SoniaBadene commented 4 years ago

Description of the Task

I wrote LFs that outputs (0 for Abstain, 1-10 for the classes, 11 for No_relation). Each LF outputs set has only one value between 1 to 10. The last LF output the 11 value if all the LFs output 0 for a candidate (not sure if I should replace all the Abstain values by No_relation or if the generative model will deduce it, because I don't want to have the Abstain class (value 0) as the result ).

I used the GenerativeModel module :

from snorkel.learning import GenerativeModel

gen_model = GenerativeModel(lf_propensity=True)
gen_model.train(L_dev)
dev_marginals = gen_model.marginals(L_dev)

I couldn't use the 46 dependencies that resulted from the following code:

from snorkel.learning.structure import DependencySelector

ds = DependencySelector()
deps = ds.select(L_dev, higher_order=True ,threshold=0.1)
print('#deps: ', len(deps))

Results

I obtain a matrix dev_marginals where for each candidate row, I have a probability of each class (LF). I then select the highest probability and the index to determine which class(LF) has been predicted.

When I count the predicted classes (max of probabilities), I have class 0 Abstain, and not at all class 11 No_relation.

Questions

Thank you in advance for your help !

santosh-b commented 4 years ago

The generative model isn't properly deducing your ABSTAIN class because it should have an index of -1, not 0. So stuff like coverage/overlap relating to ABSTAINs would be incorrect

Reference

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.