rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.21k stars 530 forks source link

[BUG] Random Forest Cython API and related refactoring #3089

Open vinaydes opened 4 years ago

vinaydes commented 4 years ago

Describe the bug We should consider following refactorings in the RF Cython and related C++ code.

  1. Break RF parameters in to two categories:
    • Strictly RF parameters: Parameters that are consistent with scikit-learn RF parameters. Ex max_features, max_depth etc.
    • Implementation/tuning parameters: Parameters that tweak implementation, ex. split_algo, max_batch_size. A user should be required to only specify RF parameters. The implementation should try to choose best possible values for the tuning parameters for given problem. However if advanced user want to try different things by tweaking implementation parameters, there would still be a way. Typical code with this change look like
...
clf =  cuml.ensemble.RandomForestClassifier(... RF parameters ...)
# Optional
clf.set_impl_params(... Implmenetation parameters ...)
clf.fit(X, y)
...
  1. Too many APIs for setting RF and decision tree parameters:

    • set_rf_params
    • set_all_rf_params
    • set_rf_class_obj
    • set_tree_params Ideally only two should be enough one for RF and other for decision tree.
  2. The DecisionTreeParams struct in randomforest_shared.pxd link is incomplete and unused. May be it can be removed?

  3. The _params_names list is incomplete link. Is it possible to add a automatic change that would capture?

  4. Code duplication: Almost everything is duplicated between randomforest_classifier.pyx and randomforest_regressor.pyx

Anything more that might come up after we take a relook at Cython code for RF.

vinaydes commented 4 years ago

@venkywonka Would be taking look at the issues reported here.

github-actions[bot] commented 3 years ago

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

venkywonka commented 3 years ago

Work items yet to be done: