Closed whiletruelearn closed 3 years ago
Thanks for the suggestion. We have this (or at least something similar) on our roadmap.
@hrzn please add the fully probabilistic model, not only prediction intervals. Using such models, it can be possible to generate realistic multivariate time series forecasts and then use empirical estimation of prediction intervals.
For example, gluon-ts
package has make_evaluation_predictions
method that literally generates a set of forecasts and then it can be used for estimation of mean, quantiles or prediction intervals: https://ts.gluon.ai/api/gluonts/gluonts.evaluation.html?highlight=make_evaluation_predictions#gluonts.evaluation.make_evaluation_predictions
The pytorch-forecasting
package also has the probabilistic model DeepAR
that can produce forecasts: https://pytorch-forecasting.readthedocs.io/en/latest/api/pytorch_forecasting.models.deepar.DeepAR.html
Using Darts' architectures, such probabilistic models as DeepAR (described in https://arxiv.org/abs/1704.04110), VEC-LSTM (described in https://arxiv.org/abs/1910.03002) or DeepTCN-Gaussian (described in https://arxiv.org/abs/1906.04397) can be implemented. I have some experience in deep probabilistic modeling by pytorch. May I help you with anything?
Hi @vpozdnyakov thanks a lot for the links. At this stage, we first have to narrow down how we want to represent probabilistic time series in the TimeSeries
objects. I see mainly three ways:
I'm leaning towards going for Option 1. First, I have the feeling it's OK to represent marginal distributions only, because in most cases models like DeepAR (and other probabilistic models) won't be used to infer full joint distributions (except maybe in the Gaussian case with a reasonable number of dimensions). Second, I think we could implement it in a way that leaves full flexibility to users to specify which quantiles they're interested in at predict()
time.
I would say we first have to nail this part (representing probabilistic series). Once this is done, we'll add support for DeepAR & Co. Once this is the case, we would be very happy to receive your contributions if you feel like implementing some of the models you mention.
@hrzn thanks for your comment. I think that the option 3 is better. First, it is really a way to represent a joint distribution. Second, it is more general and also applicable for non-parametric models, such as Transformer Real NVP (https://arxiv.org/abs/2002.06103). Third, it does not require much more memory than the option 1. For example, in M5 Uncertainty competition (https://www.kaggle.com/c/m5-forecasting-uncertainty), competitors need to predict 199 quantiles: 0.005, 0.01, ... 0.995. That means that there is need to store 199 predictions per marginal. The default number of multivariate samples can be 100 or 1000, so it is comparable with 199.
Also I think that the joint distribution can be much more useful than independent marginals. For example, forecasting total sales of two negatively correlated products involves the fact that simultaneously increasing sales of both products is unrealistic, while independent forecasts cannot account this property.
In any case what you will choose, I will be happy to implement some probabilistic models. For example, here is my draft implementation of DeepTCN-Gaussian based on your TCN block with fixed #329 (https://github.com/vpozdnyakov/probabilistic_forecasting/blob/main/notebooks/gaussian_tcn_shallow_glance.ipynb).
UPD: I have fixed the link to VEC-LSTM model, that I mentioned in previous message.
@vpozdnyakov thanks for your comment and the pointer to the ICLR paper. Indeed it seems that there's quite a strong case for capturing joint distributions. Let me see what it would take to go towards Option 3 in our case (we'll have to change the TimeSeries class quite a lot, but it's probably doable). We also have some improvements to do in order to have a better treatment of auto-regressive models (we're also on it). Once we have these in place we would be super happy to get some probabilistic models from you!
Just an update here: we are working on it and making good progress. These two PRs are implementing both probabilistic TimeSeries and DeepAR (as a first probabilistic model) https://github.com/unit8co/darts/pull/350 https://github.com/unit8co/darts/pull/361 @vpozdnyakov We went with the approach discussed of storing many samples/trajectories in TimeSeries.
@hrzn thanks! may I start working on DeepTCN model? it is a parametric model with multivariate gaussian. is it ok to use #361 as a template?
@vpozdnyakov yes definitely, we would be happy to receive your contribution! And yes it's a good starting point. Perhaps you can also check our current TCN model in case some things can be reused (or generalised).
This has been released in v0.9.0.
Some models support specifying num_samples
to predict()
, in which case they will return a "stochastic" TimeSeries
containing num_samples
samples, which describe the distribution of the time series' values. Some of the neural networks (at the moment RNNModel
and TCNModel
) are able to produce such stochastic forecasts if they are built specifying a certain likelihood
parameter (e.g., darts.utils.likelihood_models.GaussianLikelihoodModel()
to train the model with a negative Gaussian log likelihood loss).
We went for such a sampling-based representation (instead, for instance, of returning fixed confidence intervals), because it allows (i) to compute arbitrary quantiles (using e.g. TimeSeries.quantiles_df()
or TimeSeries.quantile_timeseries()
and (ii) for multivariate series it allows to capture the joint distribution over all components without assuming a specific parametric form.
Is your feature request related to a current problem? Please describe. Most algorithms have a prediction interval also associated with the forecast. This could be implemented in an algorithm agnostic way also.
https://otexts.com/fpp2/prediction-intervals.html
Describe proposed solution In
predict
have an optional parameter,prediction_interval
which when set True. returns not just forecast (yhat) , but also yhat_upper and yhat_lower based on the prediction interal.Can use the sample Naive forecast logic mentioned here to implement this.
Describe potential alternatives Prophet, auto arima etc provides these prediction interval values.
Additional context This could also be used in plots for the forecast.