Recursive Strategy Bug - Githubissues

ThiagoCM commented 3 years ago

Hello. First, I would like to thank you guys for this amazing example on applying Direct and Recursive Strategy to N Step Ahead Forecasting. I was taking a look on the recursive strategy and came upon a doubt regarding it's implementation, where I think there's a bug.

If you take a look at the picture above (you can find the math in this article), the recursive strategy is basically the 1-step-ahead direct strategy with a "feedback" (the value found at each iteration will be inserted on target array).

When you're doing this piece of code

new_point = fcasted_values[-1] if len(fcasted_values) > 0 else 0.0
target = target.append(pd.Series(index=[date], data=new_point))

You're actually inserting the first prediction (N=1) on the recursive strategy with 0.0 value, instead of actually finding the prediction (N=1) value. This will affect the lags used on the features matrix, since there will be a lag with an incorrect value in all prediction steps.

Below you can see the target and feature values for 3 iteractions after inserting 0.0 as the first prediction.

Iteraction 1

Features
                     hour  weekday  dayofyear  ...      lag_1      lag_8     lag_25
2020-01-12 20:00:00    20        6         12  ...  65.919495  80.427320  72.718000
2020-01-12 21:00:00    21        6         12  ...  34.952133  57.917430  33.341960
2020-01-12 22:00:00    22        6         12  ...  33.911217  56.941563  33.081734
2020-01-12 23:00:00    23        6         12  ...  33.244377  56.193405  33.683514
2020-01-13 00:00:00     0        0         13  ...  33.390755  53.786278  33.244377

Target
2020-01-12 20:00:00    34.952133
2020-01-12 21:00:00    33.911217
2020-01-12 22:00:00    33.244377
2020-01-12 23:00:00    33.390755
2020-01-13 00:00:00     0.000000

Iteraction 2

Features
                     hour  weekday  dayofyear  ...      lag_1      lag_8     lag_25
2020-01-12 21:00:00    21        6         12  ...  34.952133  57.917430  33.341960
2020-01-12 22:00:00    22        6         12  ...  33.911217  56.941563  33.081734
2020-01-12 23:00:00    23        6         12  ...  33.244377  56.193405  33.683514
2020-01-13 00:00:00     0        0         13  ...  33.390755  53.786278  33.244377
2020-01-13 01:00:00     1        0         13  ...   0.000000  59.202316  33.407020

Target
2020-01-12 21:00:00    33.911217
2020-01-12 22:00:00    33.244377
2020-01-12 23:00:00    33.390755
2020-01-13 00:00:00     0.000000
2020-01-13 01:00:00    34.342800

Iteraction 3

Features
                     hour  weekday  dayofyear  ...      lag_1      lag_8     lag_25
2020-01-12 22:00:00    22        6         12  ...  33.911217  56.941563  33.081734
2020-01-12 23:00:00    23        6         12  ...  33.244377  56.193405  33.683514
2020-01-13 00:00:00     0        0         13  ...  33.390755  53.786278  33.244377
2020-01-13 01:00:00     1        0         13  ...   0.000000  59.202316  33.407020
2020-01-13 02:00:00     2        0         13  ...  34.342800  68.944670  32.057076

Target
2020-01-12 22:00:00    33.244377
2020-01-12 23:00:00    33.390755
2020-01-13 00:00:00     0.000000
2020-01-13 01:00:00    34.342800
2020-01-13 02:00:00     2.395295

Also, I didn't understand why you used, on the recursive strategy, the trained model (which is returned either from the linear_model or xgboost_model functions) instead of the 1 Step Ahead model (which is used on the Direct Estrategy).

Does this make any sense or have I understand something wrong?

JamesLarkinWhite commented 2 years ago

I just found this tutorial and had the same thought rerading the implementation of the recursive forecast.

What i wrote before seems to be nonesense to me now...

I guess you would have to make a prediction before entering the loop and append the last value of the resulting array instead of 0.0 in case of the first prediction (N=1) .

At least i have seen this in a few entries for the M4 competition?

Edit: I try to implement this idea. The two variables initial_target and intial_prediction are not really needed but i thought it might help to understand my general idea.It would be really nice if somebody could give me a feedback wether or not this is a viable solution or not:

def forecast_multi_recursive_fix(y, model, lags, n_steps=FCAST_STEPS, step="1H"):

    """Multi-step recursive forecasting using the input time 
    series data and a pre-trained machine learning model

    Parameters
    ----------
    y: pd.Series holding the input time-series to forecast
    model: an already trained machine learning model implementing the scikit-learn interface
    lags: list of lags used for training the model
    n_steps: number of time periods in the forecasting horizon
    step: forecasting time period given as Pandas time series frequencies

    Returns
    -------
    fcast_values: pd.Series with forecasted values indexed by forecast horizon dates 
    """

    def create_recursive_features(target, lags):
        rec_target = target.copy()
        # forecast: create ts features
        ts_features = create_ts_features(rec_target)
        # forecast: create lag features
        if len(lags) > 0:
            lags_features = create_lag_features(rec_target, lags=lags)
            rec_features = ts_features.join(lags_features, how="outer").dropna()
        else:
            rec_features = ts_features

        return rec_features

    # get the dates to forecast
    last_date = y.index[-1] + pd.Timedelta(hours=1)
    fcast_range = pd.date_range(last_date, periods=n_steps, freq=step)

    fcasted_values = []
    target = y.copy()

    # initial Prediction for first step:
    initial_features = create_recursive_features(target, lags)
    initial_prediction = model.predict(initial_features)  # take value from original target array

    for date in fcast_range:

        new_point = fcasted_values[-1] if len(fcasted_values) > 0 else initial_prediction[-1]

        target = target.append(pd.Series(index=[date], data=new_point))
        # forecast: create recursive features
        features = create_recursive_features(target,lags)

        # forecast: Predict
        predictions = model.predict(features)
        # forecast: append predictions to fcasted_values List
        fcasted_values.append(predictions[-1])

    return pd.Series(index=fcast_range, data=fcasted_values)

JamesLarkinWhite commented 1 year ago

It would be nice if you could revie this issue.

madagra / energy-ts-analysis

Recursive Strategy Bug #4