[Question] multi_models argument affects lags?

pfwnicks commented 4 months ago

Describe the bug What is the functionality of mutli_models argument in regression models.

https://unit8co.github.io/darts/generated_api/darts.models.forecasting.regression_model.html#darts.models.forecasting.regression_model.RegressionModel

According to this documentation, the lags should be indexed from the first predicted time step. However after noticing some abnormal behavior in our regression models when switching between multi_models=True and multi_models=False. After testing a variety of things and snooping around in the documentation I/we discovered that there is a section of the guide on regression models:

https://unit8co.github.io/darts/examples/20-RegressionModel-examples.html#Visualization

where it talks about/visualizes what happens when switching multi_models modes.

Expected behavior If this is the intended functionality then the documentation should at least be updated to reflect that the indexing of the lags is affected by the multi_models argument. However the more natural way of understanding when looking at the documentation would be that the multi_models argument does not affect the indexing of the lags.

madtoinou commented 4 months ago

Hi @pfwnicks,

Thank you for reporting this. We'll make sure to include a brief sentence in the docstring of the multi_models parameters in all the model to make its impact on the lags more transparent in one of the upcoming PR.

The lags must be shifted otherwise the regression model will generate the exact same output of a given array of features since its coefficient are "unique". This shift is in direction of the past to avoid dependency on model forecasts (auto-regression) when n <= output_chunk_length.

pfwnicks commented 4 months ago

Hi @madtoinou

Thanks for the reply. Not quite sure I understand the reasoning, but from what I can gather perhaps this lies in part with the way that covariates are ingested by the models.

I assumed that when forecasting for example at horizon=n that the future covariates would be indexed from that step. In a simliar way to how a normal forecasting model would work (eg. raw sklearn).

But after reading the documentation again it seems that the future covariates are the same for all forecast time steps if n < output_chunk_length. it is only when n > output_chunk_length that the future covariates that are ingested would be updated for predicting.

While i can see the motivation for handling the observations and past covariates in this way, I do not quite understand the motivation for enforcing this type of indexing on the future covariates (which are inherently forecasted values). But perhaps can be made into a feature request where it is possible to use the future covariates from n time steps in the future.

madtoinou commented 4 months ago

The lags are indeed indexed from the first position in the forecasted period when multi_models=True. When multi_models=False and the underlying model does not support multi-output prediction, the features array needs to be somehow different for each forecasted position in output_chunk_length or the model will have to fit several labs for a given set of features (lags) which does not make sense.

output_chunk_length might be more intuitive in the context of neural network, where an arbitrary number of values can be generated from one forward pass on a single array of features (depending on the model architecture). This concept is also applied to regression model here and behave in a intuitive way when multi_models=True, as different model will be responsible for different positions and hence receive the exact same features array but forecast different labels for each.

I hope that this different way of explaining the reasoning is clearer, let me know if something is still unclear.

TLDR: to get something as close as possible to "raw sklearn", you need to use output_chunk_length=1.

pfwnicks commented 4 months ago

Hi @madtoinou

Thanks for the more detailed explanation, the reference to the neural network context makes sense.

I guess the only case I am still missing/confused about is where if I want to eg. forecast 24 hours, with 24 different models, but not overload the models with extra covariates (especially future covariates) more than necessary.

In my mind for example it would still be nice to have 24 different models (i.e. multi_models=True, output_chunk_length=24) and be able to restrict the future_covariates to being the specific covariates for example for that specefic forecast horizon.

As far as i understand, and also keeping in mind the reference to NNs, then all of my 24 models would need to be given all of the covariates (i.e. future_covariates_lags=24).

My concern that not all regression models handle this extremely well, especially in scenarios with high complexity and low data.

Here it would be nice if at least the future covariates could be indexed based on current forecast horizon, (i.e. if I gave future_covariate_lags=[-1. 0, 1] then for each forecasting step each model would receive its "own" future covariates + its neighbouring future covariates.

Again I'm not sure if this makes sense or if implementing this kind of option would require a lot of extra complexity, but I have in my mind the curse of dimensionality, where increasing the number of features without enough observations can impact the accuracy of models.

https://en.wikipedia.org/wiki/Curse_of_dimensionality

pfwnicks commented 3 months ago

@madtoinou

Would it be possible to adjust this section of the RegressionModel.predict so that if there was an option to say that future covariates should be treated the same for multi models true or false.

covariates = { "past": (past_covariates, self.lags.get("past")), "future": (future_covariates, self.lags.get("future")), }

    # prepare one_shot shift and step
    if self.multi_models:
        shift = 0
        step = self.output_chunk_length
    else:
        shift = self.output_chunk_length - 1
        step = 1

Presumably there would also be some changes in RegressionModel.fit etc. it is difficult to see how you have implemented this multi_model = True/False switch in the training dataset, but perhaps this could be considered in a future feature request?

davide-burba commented 3 weeks ago

Hi, I was looking into this multi_models=True functionality, and I was also surprised to see the underlying implementation. I'm not sure it's the best approach, because it prevents to use recent information to predict close horizons. This is especially detrimental in case of large output_chunk_length.

I think there could be better solutions to explore, such as:

unit8co / darts

[Question] multi_models argument affects lags? #2234