Chapter 22 implementation of cv_estimate() has a bug

There is an issue on chapter 22, in the following code block:

from sklearn.model_selection import KFold

def cv_estimate(train_data, n_splits, model, model_params, X, y):
    cv = KFold(n_splits=n_splits)
    m = model(**model_params)

    models = []
    cv_pred = pd.Series(np.nan, index=train_data.index)
    for train, test in cv.split(train_data):
        m.fit(train_data[X].iloc[train], train_data[y].iloc[train])
        cv_pred.iloc[test] = m.predict(train_data[X].iloc[test])
        models += [m]

    return cv_pred, models

The model m is created outside of the for loop, which causes there to be a single model that is fit multiple times. This causes the resulting list of models to be a list of the same model, repeated.

This can be verified by seeing that models[0] == models[1] returns True (which it shouldn't be).

To fix this issue, m = model(**model_params) should be moved to the first line within the for loop.

matheusfacure / python-causality-handbook

Chapter 22 implementation of cv_estimate() has a bug #320