rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 525 forks source link

[FEA] BaggingRegressor #2379

Open aerdem4 opened 4 years ago

aerdem4 commented 4 years ago

Is your feature request related to a problem? Please describe. Bagging different regression models by randomization (sampling rows and columns) is a common technique which makes models more robust. A Kaggle user was using Rapids for his solution but for BaggingRegressor, he had to move back to sklearn. https://www.kaggle.com/c/trends-assessment-prediction/discussion/156725 Considering that bagging is highly parallelizable, a Rapids implementation that runs on GPU can have a significant speed boost.

Describe the solution you'd like I would like to have BaggingRegressor implemented in Rapids. Here is the sklearn equivalent: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html

Describe alternatives you've considered I can also wrap my models in a Python function and bag myself but this may not utilize GPU as much as a native algorithm would utilize.

github-actions[bot] commented 3 years ago

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

beckernick commented 3 years ago

People interested in this feature may find this RAPIDS blog post useful. You can now pass cuML estimators to scikit-learn's BaggingRegressor.

aerdem4 commented 3 years ago

@beckernick I think what could be interesting is the opposite actually. Running sklearn models in parallel on GPU with random data and feature subsampling. Otherwise running cuml models sequentially with subsampling is just several lines of code for the user.