Closed beckernick closed 4 years ago
Small oversight in my initial scoping. The original list of issues will only unlock some of the meta-estimators. We'll need to add a self.classes_
attribute to unlock the remaining meta estimators for classification (not needed for regression meta-estimators). However, AdaBoostClassifier will only be unlocked estimator by estimator as we need to support sample weights for our cuML base estimator of choice.
Filing an issue for the classes_ attribute
With the merging of #2487 , we now essentially support these meta-estimators. AdaBoostClassifier support will come estimator by estimator as they gain sample weights functionality.
Closing.
Meta-estimators allow users to combine multiple models into a single model for potentially enhanced predictive power. A minimal example of this concept is a
Voting Classifier
, in which the predictions of 1+ models are collected and a vote is made of which label to assign based on the predictions of the input models. Another powerful approach is stacking, in which the results of 1+ models are fed into another model to make a final prediction. Scikit-learn provides an API for this as well.While we likely will want to implement analogous APIs for cuML to support end-to-end GPU inputs, the recent work standardizing estimators on an input/output type contract makes it almost possible to use scikit-learn meta-estimators with cuML on CPU arrays. Essentially, this is just drop-in replacing sklearn estimators with cuML estimators in the meta-estimator constructor. @dantegd and I have explored this a bit locally, and we are seeing large training speedups.
The following issues cover the necessary changes that would allow using cuML models with VotingClassifier and StackedClassifier, with others potentially possible as well.
2400
2398
2396
2395
2393
Scikit-learn examples: