scikit-learn-contrib / forest-confidence-interval

Confidence intervals for scikit-learn forest algorithms
http://contrib.scikit-learn.org/forest-confidence-interval/
MIT License
284 stars 48 forks source link

amount of trees needed to work #92

Closed joachimder closed 4 months ago

joachimder commented 4 years ago

When I use this on my Random forest model it only works when I use 200 or more trees in the parameters.

n_trees = 200 forest = RandomForestRegressor(n_estimators=n_trees, random_state=42)

my optimal amount of trees is 29 but then (and not with the 200 trees) I get a warning: RuntimeWarning: invalid value encountered in true_divide g_eta_main = g_eta_raw / sum(g_eta_raw) I also can't get a confidence interval around my predicted points because of the error.

I'm a missing something, why you need such an excessive amount of trees?

Edit: minimum amount of trees needed to make it work

danieleongari commented 4 months ago

Try not to use calibration if you have too few trees, but the result will be inaccurate. Studies on the impact of the number of trees on the precision of the CI estimate is welcome, but it is hard to get general rules that are not problem-dependent.