unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.96k stars 866 forks source link

Adding prediction interval to forecast #288

Closed whiletruelearn closed 3 years ago

whiletruelearn commented 3 years ago

Is your feature request related to a current problem? Please describe. Most algorithms have a prediction interval also associated with the forecast. This could be implemented in an algorithm agnostic way also.

https://otexts.com/fpp2/prediction-intervals.html

Describe proposed solution In predict have an optional parameter, prediction_interval which when set True. returns not just forecast (yhat) , but also yhat_upper and yhat_lower based on the prediction interal.

Can use the sample Naive forecast logic mentioned here to implement this.

Describe potential alternatives Prophet, auto arima etc provides these prediction interval values.

Additional context This could also be used in plots for the forecast.

hrzn commented 3 years ago

Thanks for the suggestion. We have this (or at least something similar) on our roadmap.

vpozdnyakov commented 3 years ago

@hrzn please add the fully probabilistic model, not only prediction intervals. Using such models, it can be possible to generate realistic multivariate time series forecasts and then use empirical estimation of prediction intervals.

For example, gluon-ts package has make_evaluation_predictions method that literally generates a set of forecasts and then it can be used for estimation of mean, quantiles or prediction intervals: https://ts.gluon.ai/api/gluonts/gluonts.evaluation.html?highlight=make_evaluation_predictions#gluonts.evaluation.make_evaluation_predictions

The pytorch-forecasting package also has the probabilistic model DeepAR that can produce forecasts: https://pytorch-forecasting.readthedocs.io/en/latest/api/pytorch_forecasting.models.deepar.DeepAR.html

Using Darts' architectures, such probabilistic models as DeepAR (described in https://arxiv.org/abs/1704.04110), VEC-LSTM (described in https://arxiv.org/abs/1910.03002) or DeepTCN-Gaussian (described in https://arxiv.org/abs/1906.04397) can be implemented. I have some experience in deep probabilistic modeling by pytorch. May I help you with anything?

hrzn commented 3 years ago

Hi @vpozdnyakov thanks a lot for the links. At this stage, we first have to narrow down how we want to represent probabilistic time series in the TimeSeries objects. I see mainly three ways:

  1. We keep some pre-defined quantiles for each marginal distribution (i.e., for each component in a multivariate series). Pros: simple, Cons: there is some loss of information, and it represents only for marginals (not the joint distribution)
  2. We store the parameters of the marginal distributions (e.g. mu and sigma). Pros: no loss of information (on the marginals), Cons: introduces extra complexity and works only for some pre-determined closed-form distributions. It also represents only the marginals (except for some special cases like the Gaussian where the covariance matrices can be stored).
  3. We store multivariate samples. Pros: quite simple, Cons: Will require more memory (for a keeping a significant number of samples), and there's some information loss.

I'm leaning towards going for Option 1. First, I have the feeling it's OK to represent marginal distributions only, because in most cases models like DeepAR (and other probabilistic models) won't be used to infer full joint distributions (except maybe in the Gaussian case with a reasonable number of dimensions). Second, I think we could implement it in a way that leaves full flexibility to users to specify which quantiles they're interested in at predict() time.

I would say we first have to nail this part (representing probabilistic series). Once this is done, we'll add support for DeepAR & Co. Once this is the case, we would be very happy to receive your contributions if you feel like implementing some of the models you mention.

vpozdnyakov commented 3 years ago

@hrzn thanks for your comment. I think that the option 3 is better. First, it is really a way to represent a joint distribution. Second, it is more general and also applicable for non-parametric models, such as Transformer Real NVP (https://arxiv.org/abs/2002.06103). Third, it does not require much more memory than the option 1. For example, in M5 Uncertainty competition (https://www.kaggle.com/c/m5-forecasting-uncertainty), competitors need to predict 199 quantiles: 0.005, 0.01, ... 0.995. That means that there is need to store 199 predictions per marginal. The default number of multivariate samples can be 100 or 1000, so it is comparable with 199.

Also I think that the joint distribution can be much more useful than independent marginals. For example, forecasting total sales of two negatively correlated products involves the fact that simultaneously increasing sales of both products is unrealistic, while independent forecasts cannot account this property.

In any case what you will choose, I will be happy to implement some probabilistic models. For example, here is my draft implementation of DeepTCN-Gaussian based on your TCN block with fixed #329 (https://github.com/vpozdnyakov/probabilistic_forecasting/blob/main/notebooks/gaussian_tcn_shallow_glance.ipynb).

UPD: I have fixed the link to VEC-LSTM model, that I mentioned in previous message.

hrzn commented 3 years ago

@vpozdnyakov thanks for your comment and the pointer to the ICLR paper. Indeed it seems that there's quite a strong case for capturing joint distributions. Let me see what it would take to go towards Option 3 in our case (we'll have to change the TimeSeries class quite a lot, but it's probably doable). We also have some improvements to do in order to have a better treatment of auto-regressive models (we're also on it). Once we have these in place we would be super happy to get some probabilistic models from you!

hrzn commented 3 years ago

Just an update here: we are working on it and making good progress. These two PRs are implementing both probabilistic TimeSeries and DeepAR (as a first probabilistic model) https://github.com/unit8co/darts/pull/350 https://github.com/unit8co/darts/pull/361 @vpozdnyakov We went with the approach discussed of storing many samples/trajectories in TimeSeries.

vpozdnyakov commented 3 years ago

@hrzn thanks! may I start working on DeepTCN model? it is a parametric model with multivariate gaussian. is it ok to use #361 as a template?

hrzn commented 3 years ago

@vpozdnyakov yes definitely, we would be happy to receive your contribution! And yes it's a good starting point. Perhaps you can also check our current TCN model in case some things can be reused (or generalised).

hrzn commented 3 years ago

This has been released in v0.9.0.

Some models support specifying num_samples to predict(), in which case they will return a "stochastic" TimeSeries containing num_samples samples, which describe the distribution of the time series' values. Some of the neural networks (at the moment RNNModel and TCNModel) are able to produce such stochastic forecasts if they are built specifying a certain likelihood parameter (e.g., darts.utils.likelihood_models.GaussianLikelihoodModel() to train the model with a negative Gaussian log likelihood loss).

We went for such a sampling-based representation (instead, for instance, of returning fixed confidence intervals), because it allows (i) to compute arbitrary quantiles (using e.g. TimeSeries.quantiles_df() or TimeSeries.quantile_timeseries() and (ii) for multivariate series it allows to capture the joint distribution over all components without assuming a specific parametric form.