Open Santyk1993 opened 3 years ago
@hcho3 Any thoughts on this?
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
+1 this would be a very, very impactful feature, it's a real pain to have to oversample highly imbalanced datasets, especially considering how VRAM hungry it is to train on an artificially bloated dataset on the GPU. It precludes using the RF as a workable substitute for sklearn.
I am trying to use an imbalanced dataset in cuml Random Forest. But there is no inbuilt balancing parameter similar to sklearn's RF parameter - "class-weights". I wish to see a parameter for class balancing that I can use in cuml RF.
The solution could be similar to the sklearn RF 'class-weights' parameter, or a different method to deal with imbalance between classes. Are there any recommended standard balancing techniques that can be followed before proceeding to cuml RF?
This is the only other supporting issue I have found: https://github.com/rapidsai/cuml/issues/1279