How is past_covariates sliced and paired with target series?

unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.

https://unit8co.github.io/darts/

Apache License 2.0

7.91k stars 858 forks source link

How is past_covariates sliced and paired with target series? #2019

Closed sebros-sandvik closed 11 months ago

sebros-sandvik commented 11 months ago

Hi,

Say I this:

target series, freq = 'MS': 2019-01-01 1 2019-02-01 2 2019-03-01 3

and past_covariates, freq = 'MS' 2019-01-01 1 2019-02-01 2 2019-03-01 3

Say we use model NBEATS(). In this call: model.historical_forecasts(series,past_covariates=past_covs)

Is training data build like this: Date, target, past_cov 2019-01-01, 1, 1 2019-02-01, 2, 2 2019-03-01, 2, 3

Or:

Date, target, past_cov 2019-01-01, 1, NAN 2019-02-01, 2, 1 2019-03-01, 2, 2

Seems, like it is the second from my testing. Can someone clarify?

Much appreciated,

Sebastian

dennisbader commented 11 months ago

Hi @sebros-sandvik, this guide explains how past and future covariates are used for our TorchForecastingModels (all neural network based models).

If you have NBEATSModel(input_chunk-length=2, output_chunk_length=1), then you're first example is correct.

Can I ask how you tested this so that it looks like it's the second option? That we be a bug.

sebros-sandvik commented 11 months ago

Hi,

just compare: 1.) series = TimeSeries.from_dataframe(df,"Date"," Sales", freq="MS") past_cov = TimeSeries.from_dataframe(df,"Date"," Sales", freq="MS") #i.e. = series model.historical_forecasts(series, past_covariates=past_covs, start=.5, forecast_horizon=1)

to 2.) series = TimeSeries.from_dataframe(df,"Date"," Sales", freq="MS") df['Date'] = df['Date'] - pd.DateOffset(months=1) #e.g 2019-01-01 -> 2018-12-01 past_cov = TimeSeries.from_dataframe(df,"Date"," Sales", freq="MS") model.historical_forecasts(series, past_covariates=past_covs, start=.5, forecast_horizon=1)

(just pick some model with past_covariates capabilities) According to your answer 1.) should have perfect fit since "target" is part of past_covariates. But it has not. However 2.) will have perfect fit.

Could be something on my end also of couse. Also really appreciate the answer, and job well done on the package :)

dennisbader commented 11 months ago

Hi @sebros-sandvik, actually both 1) and 2) show expected behavior. 1 is expected to not have perfect fit since the future of the target is not part of the past covariates.

Let's say the input chunk (past) of your target is [0,1,2]. The output chunk (future) of the target is [4]. The model will take the same dates from past covariates as your target in the input chunk -> The past covariate values will be [0,1,2] and there is no information of the future [4].

Only if you shift the values by 1 as you do in 2), you add the future values of target into the past covariate. That is why you get a perfect fit.

So the values for your past covariates in the input chunk will be [1,2,4] and contain the future target [4].

sebros-sandvik commented 11 months ago

Thank you for your explanation. Now, I understand.

So for instance, for NBEATS which only supports past covariates, if I want to add say a dummy variable indicating if the month is January or not I need to shift that covariate series so that yyyy-12-01 = 1.

Basically, the tabular representation fed to ML models would be

(y,x) = (observation = y_t+1; features = y_t,..,y_t-i, past_cov_t, ..., past_cov_t-i) where i = input_chunk_length.

I'm understanding it ok?

Best,

Seb

dennisbader commented 11 months ago

Yes, that's correct 👍 And for models that support future covariates, you wouldn't have to shift if you passed this covariate as future_covariates. This would add (future_cov_t+1, future_cov_t, ..., future_cov_t-i) to the features