BUG predict_proba() not compatible with ignore setting for empty leaf

neurodata / treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn

Other

61 stars 14 forks source link

This is a bug using the API defined by scikit-learn. Though it thankfully doesn't get hit when we use MIGHT because we take a different API approach (i.e. predict_proba_per_tree).

If you put up a fix, then a simple proposal would be have an if/else clause within predict_proba()

# XXX: an inline comment that states why this is necessary, and why it may be redundant in the future if
# we either enforce non-degenerate leaves during construction, or via pruning
if self.honest_prior == 'ignore':
     # call predict_proba_per_tree and then combine to get the predicted probabilities
else:
    # do what is currently in the codebase

You can add a simple unit-test using the empirical_prior='ignore' for a low sample low # of trees setting to check if nans occur and it should fail on main branch.

neurodata / treeple

BUG predict_proba() not compatible with ignore setting for empty leaf #290