yandex / rep

Machine Learning toolbox for Humans
http://yandex.github.io/rep/
Other
687 stars 144 forks source link

stacking and blending in REP - question #99

Open mglowacki100 opened 7 years ago

mglowacki100 commented 7 years ago

I'm looking for blending/stacking support for REP in similar fashion like in caretEnsemble: https://cran.r-project.org/web/packages/caretEnsemble/vignettes/caretEnsemble-intro.html as far I understand factory in REP is quite similar to caretList. I've tried to find something about it in documentation... I'd be be grateful for short example. If REP doesn't support stacking/blending out-of-the-box, do you have plan to add it?

jonas-eschle commented 7 years ago

Hi all,

just to add to the request: I was actually wondering as well if you could implement that. There is already a package which basically does that, but does not support weights... https://github.com/dustinstansbury/stacked_generalization

What I though of is to create a meta-classifier just like the BaggingClassifier, KFoldClassifier etc. I would propose the behavior to be:

instance creation takes several (unfitted) classifiers as argument for the base classifiers as well as one stacking classifier stacking_clf = StackingClassifier(base_clf=[rdf_clf, xgb_clf1, xgb_clf2, nn_clf, ...], stacking_clf=logit_clf, ...)

fitting fits the base classifiers, lets them predict on the training data (in a normal fashion, not Kfolded; if one want's to have that, one could simply use KFoldClassifier(my_base_clf) as base classifiers) and train the stacking classifier on the base classifiers predictions.

prediction lets the base classifiers predict the data. The stacking classifier then uses these predictions to predict the final predictions.

possible options to add (instanciation?):
1) use one or several columns from the data also for the stacking classifier training 2) (copy and train each base classifier n times)

I think this would complete your repository to basically contain any (popular) meta-learning technique available so far. Using the same style as for bagging, kfolding etc allows for a perfect integration into your library. What do you think?