Stacking output - Githubissues

rspadim commented 7 years ago

Hi guys I want to get the output of "base level" and save, could it be possible? for example:

sclf2 = StackingClassifier(classifiers=[xgb_model[0], xgb_model[1], xgb_model[2], xgb_model[3], xgb_model[4], xgb_model[5], xgb_model[6], xgb_model[7], knn_model[0], knn_model[1], knn_model[2], knn_model[3], nb_model, rf_model[0], rf_model[1], rf_model[2], rf_model[3], rf_model[4], rf_model[5], rf_model[6], rf_model[7], nn_model, vc ], meta_classifier=sxgb_model, use_probas=False, #false,true average_probas=True, use_features_in_secondary=False )

i want the output of all classifiers and save it, after i will change the metaclassifier and get the "best" meta_classifier, any idea? other doubt... when using stacking is common 'fold' the data and fit models on fold data and predict 'unfold' data, is this what mlxtend stacking classifier do? thanks

rasbt commented 7 years ago

Hi there,

sclf2 = StackingClassifier(classifiers=[xgb_model[0] ... I want to get the output of "base level" and save, could it be possible?

I am not entirely sure what you mean; are you referring to the predictions made by the 1-st level classifiers? In this case, you could do

predictions = [c.predict(X) for c in sclf.classifiers]

after fitting the stacking classifier.

when using stacking is common 'fold' the data and fit models on fold data and predict 'unfold' data, is this what mlxtend stacking classifier do? thanks

Are you referring to the k-fold cross-validation scheme, i.e., how the training and test folds are used? In this case, the exact algorithm can be found in the documentation at http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier

Let me know if anything about that pseudo-code or figure is unclear!

EDIT:

and predict 'unfold' data,

on a second thought, you mean the held-out / test fold? Here, that would be what the StackingCVClassifier is doing: http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/

rspadim commented 7 years ago

Nice! I was reading StackingCVClassifier the StackingClassifier is someting like ~= (many models -> predict) -> (stacked model -> predict) the StackingCV is something like ~= (kfold ~ many models -> predict parts) -> (stacked model -> predict)

my doubt now is... i was reading source code: https://github.com/rasbt/mlxtend/blob/master/mlxtend/classifier/stacking_cv_classification.py#L218

why refit the first level models? shouldn't they be fitted only with k-folds and be used again to predict k-fold-times the predict data?

for example... a single XGBoostClassifier + 5 kfolds + LogisticRegression

should train xgboost in each fold (5 models in this case), predict 5 probas for each xgboost k-fold fitted model, use probas to logisticregression input , that wil be our 'fitted model'

now predict: use each xgboost model (5 ~ 1 per k-fold) predict in predict X, use output of each model to logistic regression input, output logistic regression

i'm considering kazanova idea: http://blog.kaggle.com/2017/06/15/stacking-made-easy-an-introduction-to-stacknet-by-competitions-grandmaster-marios-michailidis-kazanova/

i'm not sure if it's ok, but that is what i understood as stacking cv classifier

rasbt commented 7 years ago

why refit the first level models? shouldn't they be fitted only with k-folds and be used again to predict k-fold-times the predict data?

There are many different flavors of stacking. This implementation is, afaik, how stacking with k-fold cross-validation is commonly implemented (e.g., see the resource I mentioned in http://rasbt.github.io/mlxtend/user_guide/classifier/StackingCVClassifier/)

Also, I'd say that it's quite intuitive to refit the 1st level classifiers on the whole dataset as it lowers the (pessimistic) bias and variance of those models (i.e., benefitting from additional data).

I think that both approaches may yield very similar outcomes though, but it might be interesting to see how those two approaches (refitting, not refitting) compare empirically.

Btw. I found that it was not entirely clear from the article you linked how exactly the stacking was implemented.

Two articles linked in that article though

Kaggle ensembling guide (https://mlwave.com/kaggle-ensembling-guide/)
A Kaggler's Guide to Model Stacking in Practice (http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/)

do refit the first level classifiers as well though:

Let’s say you want to do 2-fold stacking:

Split the train set in 2 parts: train_a and train_b

Fit a first-stage model on train_a and create predictions for train_b

Fit the same model on train_b and create predictions for train_a

Finally fit the model on the entire train set and create predictions for the test set.

Now train a second-stage stacker model on the probabilities from the first-stage model(s).

(In the excerpt above, it's not clear if the predictions from 4 are also used for fitting in 5 though)

From the other post:

3.2 For each base model M1: K-Nearest Neighbors (k = 1) M2: Support Vector Machine (type = 4, cost = 1000)

3.2.1 Fit the base model to the training fold and make predictions on the test fold. Store these predictions in train_meta to be used as features for the stacking model

Fit each base model to the full training dataset and make predictions on the test dataset. Store these predictions inside test_meta

Fit a new model, S (i.e the stacking model) to train_meta, using M1 and M2 as features. Optionally, include other features from the original training dataset or engineered features

Use the stacked model S to make final predictions on test_meta

In any case, your suggestion is a good alternative I guess, it's probably less common I'd say and might perform a bit worse but who knows. Probably the only way to find out would be to do an empirical comparison of no one's done that already :)

rasbt commented 6 years ago

I think this has been addressed, so I am closing this issue

rasbt / mlxtend

Stacking output #258