Open miguelusque opened 4 years ago
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
I think this feature request is still relevant.
cc @mlgill for visibility on warm_start request
Thanks @beckernick should have mentioned that I'm already subscribed. :)
Bringing this to the attention of @hcho3. Might be interesting to discuss the feasibility of this feature in cuML.
Is your feature request related to a problem? Please describe. When a user wants to train a very large dataset, there is a constraint in the largest model he can train because the same GPU contains not only the model being trained, but also the dataset to be used during training (or a subset of the dataset, if using MNMG version of Random Forest).
It might be useful to let the user perform the training 'on batches' by implementing the 'warm_start' parameter. I think it might help to 'squeeze' the most of the available GPU memory.
Describe the solution you'd like Implement the warm_start parameter in Random Forest (classifier and regressor).
It would be great if the random_state parameter might be changed when training on batches. There are some papers that mention that changing the random_state when training randomly selected subsets of a dataset improves the accuracy of the resulting model.
Describe alternatives you've considered I am trying to manually merge two RF models trained with two subsets of the original dataset.