rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.23k stars 532 forks source link

[FEA] Weighted feature sampling in Random Forest #1279

Open vishalmehta1991 opened 5 years ago

vishalmehta1991 commented 5 years ago

Support weighted feature sampling in cuML RF similar to sklearn.

eg: https://github.com/scikit-learn/scikit-learn/blob/1495f69242646d239d89a5713982946b8ffcf9d9/sklearn/ensemble/forest.py#L217

sample_weight : array-like, shape = [n_samples] or None

as argument to fit function

dankal444 commented 4 years ago

I am not sure, but I think it should be named "Weighted sampling" instead of "Weighted feature sampling". Linked sklearn function and description suggests it has nothing to do with features and their weighting.

Nevertheless, I must add this (weighted sampling) would be very useful feature to have. For now, it is hard to train Random Forest when one have imbalanced classes in training set. Sklearn has also class_weights parameter that could be an alternative to use in such a case, but it is not available here as well.

Sorry for doing minor offtopic, but how do you address class imbalance in cuML RandomForest? Just balancing classes yourself?