rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.07k stars 525 forks source link

[FEA] Add support for weighted feature subsampling in RF #3525

Open teju85 opened 3 years ago

teju85 commented 3 years ago

Is your feature request related to a problem? Please describe. Current RF implementation only supports uniform subsampling of features (as of 0.18). We also need to extend this to support weighted subsampling in RF.

Describe the solution you'd like Ideally, we need to expose a feature_weights option in the constructor for both classifier/regressor. It's default value is None (aka uniform subsampling). If it is not None, then it must be a list of weights one for each feature in the dataset. Then, when max_features is less than 1 (meaning subsampling is enabled), we need to perform either uniform or weighted subsampling, respectively.

Additional context JFYI, sklearn does NOT support such an option.

teju85 commented 3 years ago

After discussions with @vinaydes , it appears that unfortunately, our current approach of using no temporary memory for generating uniform feature sampling does NOT work with weighted sampling! :(

vinaydes commented 3 years ago

It can be made to work, but the computational cost would be too high as we are trading extra compute for zero memory.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.