unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.91k stars 857 forks source link

[Question] Explanation needed on input and output_chunk_length #2513

Closed PranayMehta closed 3 days ago

PranayMehta commented 2 weeks ago

About I am trying to implement time series forecasting models using darts. My task is to first train univariate forecasting models and then try to establish a baseline. Once done, then I want to train my models using past co-variates and see if the error can improve. I want to predict next 90 days of energy demand using historicals and then using historicals + past co-variates.

Question While coming across documentation in darts, I am slightly confused about how some of parameters are leveraged when training these models.

For example, when using Linear Regression, there is a parameter called output_chunk_length. How does it affect the output or the training ?

If we consider linear regression with univariate models, then the equation (example) is Yt = Yt-1 + Yt-7 etc.

How does the model get fit if Case A - I pass output_chunk_length = 7 Case B - I pass output_chunk_length = 90

Can someone explain in layman terms of how the model outputs are affected while training when this parameter changes? Is something similar followed with input_chunk_length too?

eschibli commented 2 weeks ago

If multi_models=True (the default), output_chunk_length different, independent linear models are trained, one for each horizon up to output_chunk_length.

madtoinou commented 2 weeks ago

Hi @PranayMehta, I would recommend going trough the Quickstart Notebook and the RegressionModel Example Notebook to familiarize yourself with the Darts terminology and features. The impact of input_chunk_length and output_chunk_length are explained in details :)

PranayMehta commented 2 weeks ago

Thank you @madtoinou for the link to RegressionModel Example. I understood now how output_chunk_length works with multi_models = True setting. If set to True and output_chunk_length = 7 then 7 different models are trained with the inputs.

However, I still have a few questions for multi_models = False

  1. During training phase, the documentation says that the a single model is used to predict only the last point in output_chunk_length. So if output_chunk_length = 7, does it mean that we would be training for Yt+7 using the inputs?
  2. During predict phase, I was trying to go through the Vizualization section that you have pieced together. For multi_models = False, output_chunk_length = 2, when calling predict(n=4) -
image

2.A) why does the model not use the value at t-1 to make the forecast 1 ? 2.B) The documentation states that single model is used to predict only the last point in output_chunk_length. So the start of the predict(n=4) process should predict t+1 isn't it ? But the visual shows forecast being made at t0 too. So I am confused a bit there.

madtoinou commented 1 week ago
  1. When multi_models=False, the single model is still responsible for predicting all the steps in the output_chunk_length, not only the last one. The timeseries in the dataset are tabularized as pictured in the images in order to obtain the desired outcome (it's not just a sliding window).
  2. Since there is only one set of coefficients (when multi_models=False, assuming the model is a linear regression for the sake of the example), the inputs need to be different to forecast each step or the models would predict the same value for all the steps in output_chunk_lengths. Hence the shift of the lags by output_chunk_length - n when forecasting the n-th step in output_chunk_length and the fact that forecasting of t0 does not rely on t-1 (output_chunk_length - n = 2 - 1 = 1).
  3. By convention, t0 is always the first step of the forecast horizon.

I hope that it clarifies things.

wehrlik commented 4 days ago

Hi! Thank you @madtoinou for the clarification on the tabularization with multi_models=False. Reading the documentation, I was also not sure, if I had understood it correctly.

I still think it is unfortunate, that the forecasting of t0 does not use the most recent information from t-1. Especially, with longer lead times, the forecasting of t0 can depend on rather old data. Furthermore, with time progressing, it turns out to be useless to update the forecast for the time horizon of the output chunk length. If one time step after the initial forecast, e.g. t+1 becomes t0, the input values remain the same and the forecast made for t0 will be the same as it was before for t+1. I tried to show this in the following figure where time progresses by one time step from one table going to the next below:

image

Related to this, do you think it would be possible, to include forecasts from earlier lead times to predict later lead times? So for example t0 as input for t+1, t0 and t+1 as input for t+2 and so on. Not sure, but this would probably only work with multi_models=True but would still ensure some sort of dependence and consistence of forecasted timeseries, while also being able to include latest information and update every time step.

madtoinou commented 3 days ago

Hi @wehrlik,

I understand your point; the gap between the most recent lags and the forecasted step definitely impact the quality of the forecasts. However, in order to have a single model ("one set of coefficients/weights") forecasting all the step in output_chunk_length without relying on auto-regression, applying a shift to the lags is a pretty traditional approach. Some alternatives were mentioned in the issue #2234, we are reading about them and trying to see how they could be added to Darts.

In the meantime, if you want to be able to avoid the gap between the lags and the forecasted step, you can simply use multi_models=True. And if you want to be able to use the forecast of t0 to forecast t+1, set output_chunk_length=1. Each of these approaches has pros & cons, it ultimately boils down to what you want to achieve.

Since the original question has been answered and the discussion is slightly drifting on another topic, I am going to close this issue, feel free to open a new one if anything remains unclear.