rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.14k stars 526 forks source link

[FEA] add sklearn's "out-of-bag" and "feature importance" scores to cuML's Random Forest #3361

Open BenWynne-Morris opened 3 years ago

BenWynne-Morris commented 3 years ago

I've been impressed with the speed at which I can train a cuML random forest, which I've been able to get working with WSL2.

However, I've noticed that a couple of fairly standard random forest features appear to be missing:

  1. out-of-bag scores (https://scikit-learn.org/stable/auto_examples/ensemble/plot_ensemble_oob.html)
  2. feature importance scores (https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html)

I think the latter would be especially useful to give you a "rapid" assessment of feature importance as a precursor to exploring other candidate models.

taureandyernv commented 3 years ago

@BenWynne-Morris can you rename the feature request to something like [FEA] add sklearn's "out-of-bag" and "feature importance" scores to cuML's Random Forest? Making the title more descriptive will help us with tracking and what not.

BenWynne-Morris commented 3 years ago

@BenWynne-Morris can you rename the feature request to something like [FEA] add sklearn's "out-of-bag" and "feature importance" scores to cuML's Random Forest? Making the title more descriptive will help us with tracking and what not.

No problem, thanks Taurean

github-actions[bot] commented 3 years ago

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

BenWynne-Morris commented 3 years ago

These features would still be a useful.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

maltekuehl commented 3 years ago

I second this feature request. feature_importances_ is a very basic and commonly used feature of the sklearn RandomForestClassifier class and something that is best implemented at the library level as it can't easily be added by the user after the fact.

yankikalfa commented 2 years ago

This is still an important issue

Wulin-Tan commented 2 years ago

it is an important issue worth a look.

indalaterre commented 8 months ago

Is this still under evaluation?