Closed PSSF23 closed 2 months ago
This is a bug using the API defined by scikit-learn. Though it thankfully doesn't get hit when we use MIGHT because we take a different API approach (i.e. predict_proba_per_tree).
If you put up a fix, then a simple proposal would be have an if/else clause within predict_proba()
# XXX: an inline comment that states why this is necessary, and why it may be redundant in the future if
# we either enforce non-degenerate leaves during construction, or via pruning
if self.honest_prior == 'ignore':
# call predict_proba_per_tree and then combine to get the predicted probabilities
else:
# do what is currently in the codebase
You can add a simple unit-test using the empirical_prior='ignore' for a low sample low # of trees setting to check if nans occur and it should fail on main
branch.
When using the "ignore" option for empty leaf, the sum of posterior estimates would be
nan
if even only one tree encounters an empty leaf. When the number of trees is high, this will result in allnan
's forpredict_proba()
https://github.com/neurodata/scikit-tree/blob/98b67fcbaf028ee8a9bd92407089ab4a34008e9b/sktree/ensemble/_honest_forest.py#L824-L829