Closed RicSpd closed 4 years ago
Hi @RicSpd thanks for the great questions!
Re: the first one: any supervised machine learning model that is being trained in the standard way (to maximize the expected prob of the training data) can be modified to accept probabilistic labels- it's intuitively just re-weighting how much to weight the labels in the training objective. As examples, we support log reg as a default model in the repo, and many others have been used in the Snorkel OSS community!
Re: the second question- yes, you can definitely do that!
Thanks! Alex
Hi, after reading your papers and the first of your tutorials, I'm still not so sure about which models can handle probabilistic labels and which cannot. Until now, I made the following distinction:
predict_proba()
method of theLabelModel
into binary values.Is this distinction correct or can it be integrated/improved?
Moreover, another question regarding this topic. If I need to discretize the predicted probabilities obtained by
predict_proba()
- for instance, I want to assign label 1 to the observations whose positive-class probability predicted by theLabelModel
is larger than a threshold t - does it make sense to use a validation set with gold labels (distinct from the development and the test sets) and tune the threshold t in order to obtain the maximum accuracy/F1-score on this validation set, and then apply the optimized threshold to discretize the predicted probabilities of the unlabeled training set too?I hope I've been clear in presenting my questions; in case I will edit them.
P.S. Great job with the Snorkel project, I find the applications very interesting and useful!