timeseriesAI / tsai

Time series Timeseries Deep Learning Machine Learning Python Pytorch fastai | State-of-the-art Deep Learning library for Time Series and Sequences in Pytorch / fastai
https://timeseriesai.github.io/tsai/
Apache License 2.0
5.09k stars 635 forks source link

Examples for forecasting with Transformer model #159

Closed diegoquintanav closed 3 years ago

diegoquintanav commented 3 years ago

Hi! from #125 it's not yet clear to me how a forecasting problem looks like. I've noticed there is a TSForecasting class at the implementation level, which is the same as TSRegression (both are equally set to ToFloat) but it breaks other parts of the API.

See https://github.com/timeseriesAI/tsai/blob/aa8b32a50d52692355214c35cb140f586600db66/tsai/data/core.py#L112

How does inference work in a forecasting example? From the examples I use learner.get_preds(ds_idx=1) but how is this working internally? It uses fastai.learner.Learner.get_preds which I have trouble following :sweat_smile:.

In other words, consider the following gist


from tsai.all import *
print('tsai       :', tsai.__version__)
print('fastai     :', fastai.__version__)
print('fastcore   :', fastcore.__version__)
print('torch      :', torch.__version__)

# tsai       : 0.2.18
# fastai     : 2.4.1
# fastcore   : 1.3.20
# torch      : 1.9.0

# df = some df I loaded
X, y = SlidingWindow(window_length, horizon=horizon, get_x=feature_columns, get_y='my_target_var')(df)
splits = get_splits(y, valid_size=.3, stratify=True, random_state=43, shuffle=False)

# TSRegression = ToFloat
# https://github.com/timeseriesAI/tsai/blob/aa8b32a50d52692355214c35cb140f586600db66/tsai/data/core.py#L111
tfms  = [None, [TSRegression()]]
# tfms  = [None, [TSForecasting()]] # breaks other methods
batch_tfms = TSStandardize(by_sample=True, by_var=True, verbose=True)
dls_inc = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms, bs=128, device="cpu")
learn_inc = ts_learner(dls_inc, InceptionTime, metrics=[mae, mape, mse, rmse], cbs=ShowGraph())
learn_inc.fit_one_cycle(50, 1e-2)

# pred
valid_preds_inc, valid_targets_inc = learn_inc.get_preds(ds_idx=1)
valid_preds_inc.flatten().data, valid_targets_inc.data

plt.plot(valid_preds_inc.flatten().data)
plt.plot(valid_targets_inc.data);
print(valid_targets_inc.shape)

And the output of check_data(X, y, splits)

X      - shape: [1089 samples x 136 features x 7 timesteps]  type: ndarray  dtype:float64  isnan: 0
y      - shape: (1089,)  type: ndarray  dtype:float64  isnan: 0
splits - n_splits: 2 shape: [872, 217]  overlap: [False]

Questions:

  1. what is get_preds(idx=1) doing?
  2. How do I change this problem to a forecasting problem? I'm thinking of a single-step and a multi-step forecasting problem on the test set

Thanks!

oguiza commented 3 years ago

Hi @diegoquintanav,

I've built a quick gist with a dummy dataset to demonstrate how both single-step and multi-step forecasting problems could be implemented with tsai.

As you'll see the key is to build the target with the expected shape. tsai will recognize the shape of the target and will create an output of the same shape.

Questions:

I'm not sure what you mean when you say "but it breaks other parts of the API.". Could you please elaborate on that?

I'm in the process of creating 1 or 2 more detailed examples to demonstrate how to use forecasting using tsai.

diegoquintanav commented 3 years ago

Hi @oguiza, and thanks for the reply!

About the API breaking, I can't reproduce the issue right now, but some methods implemented in the learner were not working if I used TSForecasting instead of TSRegression. Don't mind that for now. I will open another issue if I can reproduce the problem again.

Thanks again for your time and for the library too!

edit: I understood the meaning of splits[1]

diegoquintanav commented 3 years ago

So I think an autoregressive forecast would look something like this

# name aliasing for local referencing
seq_in_len = window_length
seq_out_len = horizon

# empty placeholder
preds = np.repeat(np.nan, len(splits[1]) + seq_in_len + seq_out_len - 1)

# seed first values
preds[:seq_in_len] = X[splits[1]][0].flatten()

for ix in range(len(splits[1])):
    new_x = preds[: ix + seq_in_len] # alternatively
    _, _, _valid_decoded_preds_inc = learn_inc.get_X_preds(new_x)
    # get size of last output
    _h = _valid_decoded_preds_inc.flatten().shape[0]
    # replace values in placeholder array
    preds[ix + seq_in_len : ix + seq_in_len + _h] = _valid_decoded_preds_inc.flatten()

If I do this with the InceptionTime model, I get something like

fig, ax = plt.subplots(figsize=(15, 7))
ax.plot(preds, label="decoded_preds (AR)")
ax.plot(X[splits[1]][0, 0, :].tolist(), marker="o", label="first window")
ax.plot([np.nan]*window_length + valid_decoded_preds_inc[:, 0].tolist(), label="decoded_preds (Window ahead)")
ax.plot([np.nan]*window_length + y[splits[1]][:, 0].tolist(), label="targets")
fig.legend()

image

Which looks more like an autoregressive forecast (values are not relevant). Tell me what you think :+1:!

oguiza commented 3 years ago

Hola Diego,

I believe it's correct, but I don't fully understand everything in your code. For example, with this code:

for ix in range(len(splits[1])):
    new_x = preds[: ix + seq_in_len]

new_x is increasingly larger. That doesn't make much sense to me if you have trained the model with equally long inputs.

Having said that, in my experience, results are better with a multiple output forecast (that is, creating the forecast for the entire horizon simultaneously). It'd also be easier to create in tsai. You just need to pass a target with the desired horizon length.

diegoquintanav commented 3 years ago

Right, considering I'm doing a forecast at time t[i] = t_i, I will produce t[i+1:t+horizon]. The model was trained using t[i-window_length:i], so a better way would be something like

new_x = preds[ix: ix + seq_in_len]

that fixes the input dimensionality to window_length. I'm not sure which one is better though. It is true that it is not the way the model was trained, and it produces a totally different output that does not damp over time.

image

About the multiple output forecast (or multi-step forecast), I believe that the case I'm proposing is the recursive multistep forecast (2)

prediction(t+1) = model(obs(t-1), obs(t-2), ..., obs(t-n))
prediction(t+2) = model(prediction(t+1), obs(t-1), ..., obs(t-n))

and what you suggest is number (4), by setting a large enough number in the horizon argument. In this case, I have many questions about the model itself (I have had trouble understanding the underlying lightning API for training :sweat_smile: )

  1. For the case of the TransformerModel, does this mean that the encoder is fed window_length data and the decoder is fed horizon during training? What happens during inference?
  2. I'd still need to know the meaning of the outputs, as I posted before: the meaning of each element in valid_decoded_preds and what does decoded mean?
  3. Transformer complexity is O(L^2) so I wonder how good is the idea of setting a large horizon.
oguiza commented 3 years ago

I'll try to answer your questions. But before I have a few comments:

As to your questions:

  1. TransformerModel doesn't have a decoder. It only has an encoder. You only need to ensure X (input) and y (output) have the desired shape. X: [n_samples x n_variables x history] and y: [n_samples x horizon] for univariate and [n_samples x n_variables x horizon] for multivariate. tsai will automatically create a head that will generate the expected output shape. This applies to all models, not just to TransformerModel. You can also, for example, try this approach with InceptionTimeor TST.
  2. In the case of a regression task, the first value is the prediction. The second is the target (if you pass a y to get_X_preds), otherwise None. The 3rd is the decoded prediction. That is if you pass a reversible Transform to the target, it will be reversed. If you don't apply any, the 1st and 3rd terms will be the same (they usually are).
  3. The horizon defines the length of the target. And the target is not passed through the model. So no need to worry about that. You only need to worry about the amount of history used. If that is too long, you may have a memory issue as you rightly say. In that case, I'd recommend you to use a CNN like InceptionTime.
diegoquintanav commented 3 years ago

Hey, thanks for answering! Everything is more clear now. I will close the issue.