shane-kercheval / oo-learning

Python machine learning library based on Object Oriented design principles; the goal is to allow users to quickly explore data and search for top machine learning algorithm candidates for a given dataset
MIT License
1 stars 0 forks source link

Add Model Stacking #4

Closed shane-kercheval closed 6 years ago

shane-kercheval commented 6 years ago

Overview

v1 stacking will be simple and based off of http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/

Note: the assumption in flow is that each specific model will have been previously cross-validated in order to choose the best hyper-params for the specific model; so, at least in v1, the stacker will not search for params. Possibly could add nesting cross-validation in the future.

(Suppose we use ModelFitter); The data_x passed in is the full training set. We will assume a test set is withheld. adopted from http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/

Steps

1. Partition the training data into five test folds (note: 5 could be refactored as parameter)
2. Create a dataset called train_meta with the same row Ids and fold Ids as the training dataset, with empty
    columns M1 and M2.
    Similarly create a dataset called test_meta with the same row Ids as the test dataset and empty columns
        M1 and M2 (NOTE: this will be in the `_predict` function
3. For each test fold
    3.1 Combine the other four folds to be used as a training fold
    3.2 For each base model (with chosen hyper-params)
    3.2.1 Fit the base model to the training fold and make predictions on the test fold.
        Store these predictions in train_meta to be used as features for the stacking model
        NOTE: i will also have to do the model specific Transformations

4. Fit each base model to the full training dataset and make predictions on the test dataset.
    Store these predictions inside test_meta
    NOTE: i will make predictions as part of the `_predict` function

5. Fit a new model, S (i.e the stacking model) to train_meta, using M1 and M2 as features.
    Optionally, include other features from the original training dataset or engineered features.

6. Use the stacked model S to make final predictions on test_meta
    NOTE: this will be in `_predict`

Other Requirements

shane-kercheval commented 6 years ago

Other notes/resources: