Overview

v1 stacking will be simple and based off of http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/

[x] classification
[x] regression

Note: the assumption in flow is that each specific model will have been previously cross-validated in order to choose the best hyper-params for the specific model; so, at least in v1, the stacker will not search for params. Possibly could add nesting cross-validation in the future.

(Suppose we use ModelFitter); The data_x passed in is the full training set. We will assume a test set is withheld. adopted from http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/

Steps

1. Partition the training data into five test folds (note: 5 could be refactored as parameter)
2. Create a dataset called train_meta with the same row Ids and fold Ids as the training dataset, with empty
    columns M1 and M2.
    Similarly create a dataset called test_meta with the same row Ids as the test dataset and empty columns
        M1 and M2 (NOTE: this will be in the `_predict` function
3. For each test fold
    3.1 Combine the other four folds to be used as a training fold
    3.2 For each base model (with chosen hyper-params)
    3.2.1 Fit the base model to the training fold and make predictions on the test fold.
        Store these predictions in train_meta to be used as features for the stacking model
        NOTE: i will also have to do the model specific Transformations

4. Fit each base model to the full training dataset and make predictions on the test dataset.
    Store these predictions inside test_meta
    NOTE: i will make predictions as part of the `_predict` function

5. Fit a new model, S (i.e the stacking model) to train_meta, using M1 and M2 as features.
    Optionally, include other features from the original training dataset or engineered features.

6. Use the stacked model S to make final predictions on test_meta
    NOTE: this will be in `_predict`

Other Requirements

[x] show resampling data for each model (for >= 1 scores)
[x] show correlation matrix between predictors of train_meta
[x] document

shane-kercheval / oo-learning

Add Model Stacking #4

Overview

Steps

Other Requirements