ntucllab / libact

Pool-based active learning in Python
http://libact.readthedocs.org/
BSD 2-Clause "Simplified" License
776 stars 174 forks source link

Probabilistic models #117

Closed ghost closed 7 years ago

ghost commented 7 years ago

I would like to ask you about which classsifiers are theorized as Probabilistic so as to be combined with query strategies like Uncertainty Sampling?

Thanks in advance.

chkoar commented 7 years ago

@stkarlos in terms of libact API, a probabilistic model is any classifier that inherits from ProbabilisticModel, implements the predict_proba method and outputs class membership probabilities. The UncertaintySampling strategy expects a model that derives from ProbabilisticModel or ContinuousModel and implements a predict_proba method and a predict_real, respectively. Currently in libact there is a libact.models.logistic_regression.LogisticRegression model that inherits from the ProbabilisticModel and implements the predict_proba method. Apart from that you could use any scikit-learn API compatible classifier that implements the predict_proba method using the SklearnProbaAdapter.

Check this code example out.

yangarbiter commented 7 years ago

Maybe this article can help you understand better about the difference between predict_real and predict_proba.

LogisticRegression is the most natural classifier to be taken as Probabilistic. While other classifier should be able to be used if you use calibrators like the CalibratedClassifierCV in sklearn to implement predict_proba.

BTW, @stkarlos , you may use ContinuousModel for Uncertainty Sampling as long as least confidence or smallest margin are used to estimate the uncertainty (set the method parameter = 'lc' or 'sm').

yangarbiter commented 7 years ago

It seems this issues is solved for now.