rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 525 forks source link

[FEA] Implement GradientBoostingRegressor from sklearn.ensemble #5374

Closed FerdinandoR closed 1 year ago

FerdinandoR commented 1 year ago

I use sklearn.ensemble.GradientBoostingRegressor often in my work and would like to be able to use it in cuML too.

I wish there were a cuml.ensemble.GradientBoostingRegressor class to perform XGBoost on my data, just like there is one in sklearn.

hcho3 commented 1 year ago

Are you able to use XGBoost? What is the use case you have in mind when proposing GradientBoostingRegressor in cuML?

FerdinandoR commented 1 year ago

Thanks for the quick reply!

I recently joined a company where we are applying a bunch of models to a dataset and we select the best model for each data regime. We use GradientBoostingRegressor as one such model among many. Because training each model on our large dataset is painfully slow, I'm looking for ways to train the models more quickly. Some other models are implemented in cuml, which is great, but xgb is one of the slowest models to train and is not implemented in cuml, which is why I'd like it.

Does that help? Also, what do you mean with "are you able to use XGBoost"? How would I use it if not through sklearn.ensemble.GradientBoostingRegressor, and why should I?

EDIT: I should mention that we later use sklearn.model_selection.GridSearchCV or RandomSearchCV to optimise the hyperparameters, and whatever model implementation we use should be amenable to such optimisation.

hcho3 commented 1 year ago

XGBoost supports training with GPUs. Have you tried using XGBoost with GPU acceleration?

beckernick commented 1 year ago

I recently joined a company where we are applying a bunch of models to a dataset and we select the best model for each data regime.

In addition to trying XGBoost on GPUs, you may also be interested in exploring PyCaret. PyCaret is low-code machine learning library essentially designed for this use case, and it supports GPU-accelerated training for a variety of models through cuML and XGBoost.

FerdinandoR commented 1 year ago

Thanks, I didn’t realise that I could use xgboost.XGBRegressor on GPU by merely setting a flag. Will try it out and come back if that doesn’t work.

FerdinandoR commented 1 year ago

I timed xgboost against the sklearn implementation and realised that sklearn is faster for my data size (up to 17200 rows). Therefore I won't replace the sklearn implementation at this time. Thanks everyone for the help!

hcho3 commented 1 year ago

No problem.

sklearn is faster for my data size (up to 17200 rows).

It makes sense. Feel free to try the GPU algorithm in XGBoost once you have more data.