snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

bug in probs_to_preds #1529

Closed orico closed 4 years ago

orico commented 4 years ago

TLDR, it will assign 0 if the probs are all equal. this functionality is misleading when calculating metrics.

from snorkel.utils import probs_to_preds

probs_dev = label_model.predict_proba(L_dev) print(probs_dev[:10]) preds_dev = probs_to_preds(probs_dev) print(preds_dev[:10])

[[0.25 0.25 0.25 0.25 ] [0.03142018 0.03296586 0.90272909 0.03288487],...] [0 2 ...]

bhancock8 commented 4 years ago

Thanks for posting! I believe, however, that the default tie-breaking policy in probs_to_preds is "random" (https://github.com/snorkel-team/snorkel/blob/4c361335c43305fd3ba2991f40b243f76b863503/snorkel/utils/core.py#L14), so when you have tied probabilities, it will randomly select among the tied indices. For example:

> probs = np.ones((5,4)) * 0.25
> preds = probs_to_preds(probs)
> print(preds)
array([0, 3, 0, 3, 2])

Let us know if you see behavior otherwise!