rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.24k stars 532 forks source link

[FEA] Balancing class weights in imbalanced datasets. #3498

Open Santyk1993 opened 3 years ago

Santyk1993 commented 3 years ago

I am trying to use an imbalanced dataset in cuml Random Forest. But there is no inbuilt balancing parameter similar to sklearn's RF parameter - "class-weights". I wish to see a parameter for class balancing that I can use in cuml RF.

The solution could be similar to the sklearn RF 'class-weights' parameter, or a different method to deal with imbalance between classes. Are there any recommended standard balancing techniques that can be followed before proceeding to cuml RF?

This is the only other supporting issue I have found: https://github.com/rapidsai/cuml/issues/1279

drobison00 commented 3 years ago

@hcho3 Any thoughts on this?

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

SlapDrone commented 11 months ago

+1 this would be a very, very impactful feature, it's a real pain to have to oversample highly imbalanced datasets, especially considering how VRAM hungry it is to train on an artificially bloated dataset on the GPU. It precludes using the RF as a workable substitute for sklearn.