rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.26k stars 535 forks source link

[FEA] Implement warm_start parameter in Random Forest (Classifier and Regressor) #2101

Open miguelusque opened 4 years ago

miguelusque commented 4 years ago

Is your feature request related to a problem? Please describe. When a user wants to train a very large dataset, there is a constraint in the largest model he can train because the same GPU contains not only the model being trained, but also the dataset to be used during training (or a subset of the dataset, if using MNMG version of Random Forest).

It might be useful to let the user perform the training 'on batches' by implementing the 'warm_start' parameter. I think it might help to 'squeeze' the most of the available GPU memory.

Describe the solution you'd like Implement the warm_start parameter in Random Forest (classifier and regressor).

It would be great if the random_state parameter might be changed when training on batches. There are some papers that mention that changing the random_state when training randomly selected subsets of a dataset improves the accuracy of the resulting model.

Describe alternatives you've considered I am trying to manually merge two RF models trained with two subsets of the original dataset.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

miguelusque commented 3 years ago

I think this feature request is still relevant.

beckernick commented 2 years ago

cc @mlgill for visibility on warm_start request

mlgill commented 2 years ago

Thanks @beckernick should have mentioned that I'm already subscribed. :)

viclafargue commented 2 years ago

Bringing this to the attention of @hcho3. Might be interesting to discuss the feasibility of this feature in cuML.