Open samuelefiorini opened 1 year ago
Thanks for the suggestion! We haven't planed for this yet, but we now take a note. Will update with you if we have this feature implemented. In the meanwhile please feel free to submit a pull request for this feature change if you need to use that. Thanks!
Thanks, I did some experiments (here) and I've been able to make it run (it's far from being a PR though). In my case (hourly forecast with 2+ years of historical data) HistGradientBoostingRegressor
is way faster than GradientBoostingRegressor
(around 4x) while it has roughly the same performace in backtest.
However, there are also some points of discussion. For instance: due to its implementation, HistGradientBoostingRegressor
does not offer a native feature importance measure. While both GradientBoostingRegressor
and RandomForestsRegressor
do.
A possible approach would be to rely on something like sklearn.inspection.permutation_importance
, but this of course comes with higher computational cost, and it's probably not ideal. Otherwise a dummy empty array may be used, maybe raising some warning to inform the user.
It’s been a while, but the issue regarding the addition of feature_importance
in HistGradientBoosting*
estimator is still open on scikit-learn: 15132. I’m adding this here for future reference.
I use Greykite to forecast hourly time-series with years of historical data and
fit_algorithm=gradient_boosting
is very slow.According to sklearn.ensemble.HistGradientBoostingRegressor
have you considered adding support for this estimator? It looks straightforward from here, but I may be wrong.