Open aerdem4 opened 4 years ago
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
People interested in this feature may find this RAPIDS blog post useful. You can now pass cuML estimators to scikit-learn's BaggingRegressor.
@beckernick I think what could be interesting is the opposite actually. Running sklearn models in parallel on GPU with random data and feature subsampling. Otherwise running cuml models sequentially with subsampling is just several lines of code for the user.
Is your feature request related to a problem? Please describe. Bagging different regression models by randomization (sampling rows and columns) is a common technique which makes models more robust. A Kaggle user was using Rapids for his solution but for BaggingRegressor, he had to move back to sklearn. https://www.kaggle.com/c/trends-assessment-prediction/discussion/156725 Considering that bagging is highly parallelizable, a Rapids implementation that runs on GPU can have a significant speed boost.
Describe the solution you'd like I would like to have BaggingRegressor implemented in Rapids. Here is the sklearn equivalent: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html
Describe alternatives you've considered I can also wrap my models in a Python function and bag myself but this may not utilize GPU as much as a native algorithm would utilize.