suiji / Arborist

Scalable decision tree training and inference.
Other
82 stars 14 forks source link

How to extract a confidence score from `pyborist` #31

Closed mjuarezm closed 7 years ago

mjuarezm commented 7 years ago

It would be very useful to extract a confidence score (such as the vector of posterior probabilities) from the predict function in the pyborist model. Any idea how this could be done? Thanks.

suiji commented 7 years ago

The leaf data structures should be rich enough to do what you need. For example, this is what is used to derive quantile predictions for regression and probabilities for classification. Rich leaf information is enabled by default.

How are you doing prediction? Are you converting the trained trees to a Python format?

mjuarezm commented 7 years ago

Thanks for the quick reply., @suiji This is very useful.

I'm using clf.predict(test_feature_vector). Maybe I can use the predict_proba() function as shown in the example of the pyborist's README file. Is that equivalent to what you suggested?

suiji commented 7 years ago

Probably (so to speak): predict_proba() looks like a generic Py-lang method more-or-less equivalent to R's own generic predict(..., prob = TRUE). For classification, the R method computes a distribution of the various scores associated with a given leaf. Now for regression, the situation is analagous but somewhat more complicated because the scores/predictions/outcomes are not discrete, so have to be ordered instead of merely grouped. I don't know what Py-lang generics do this sort of thing, although H20, Dato and some of the other big kids do have their own quantile estimation tools. Rborist simply does the brute-force ordering using, essentially, the same leaf information as for classification.

suiji commented 7 years ago

Are you able to use the prediction methods to your advantage? If so, I would like to close this thread and the other open thread from @fyears, then open a new thread on the state of the Py-lang port.

mjuarezm commented 7 years ago

Hi @suji. Yes, we are. Thanks for your help!