vanderschaarlab / synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
https://www.vanderschaar-lab.com/
Apache License 2.0
393 stars 51 forks source link

[Time Series] Add Autoregressive Model #110

Open bcebere opened 1 year ago

tztsai commented 1 year ago

For probabilistic autoregressive time series models, DeepAR from Amazon maybe a good candidate. For deterministic AR models, there are classical models like (S)ARIMA(X) and deep models like AR-Net and Neural Prophet (which makes use of AR-Net). I can add a plugin utilizing any of these models if you would like to.

ZhaozhiQIAN commented 1 year ago

Hi, thanks for that! Here're some requirements to help you choose which one to integrate:

  1. Dependency and license compatibility
  2. Can the method generate trend or seasonality ? (the classical models (S)ARIMA(X) assume stationarity and do not model trend or seasonality)
  3. Can the method only make deterministic prediction (i.e. predicting the mean)? We need to handle data generation and model the full distribution.

Anyway, you might want to experiment with these libraries first to understand their capability.

And a heads up:

tztsai commented 1 year ago

Hi Zhaozhi, thanks for your reply! I'd like to provide some additional information regarding the libraries and their capabilities:

  1. All the mentioned models (SARIMA, DeepAR, NeuralProphet) have implementations in GitHub repositories with an MIT license, ensuring dependency and license compatibility.

  2. SARIMA, DeepAR, and (Neural)Prophet are all capable of modeling trend and seasonality. While ARMA is limited to stationary time series, ARIMA and SARIMA can tackle nonstationarity through the integration process, with SARIMA specifically addressing the seasonal component of the series. In NeuralProphet, the time series is decomposed into trend, seasonal, autoregressive (modeled by an AR-Net), error components (modeled as a Gaussian random variable), etc., making it highly effective for time series with strong seasonality.

  3. Although (Neural)Prophet produces a point prediction (mean response), it also estimates the confidence interval of its prediction. For (S)ARIMA(X), however, even the confidence interval is not available. DeepAR is the most flexible one, allowing modeling time series with other distributions, such as Beta or a Gaussian mixture. However, DeepAR may require multiple related time series in the training set for optimal performance. In practice, (Neural)Prophet generally demonstrates better forecasting performance.

If it suffices to model the noise or error component of a time series with a simple Gaussian distribution, NeuralProphet seems to be the most suitable option. It has good performance and is clearly interpretable thanks to its additive decomposition into trend, seasonality, autoregression, etc. Let me know if you have any other questions or if you would like me to explore more options.

ZhaozhiQIAN commented 1 year ago

Hi Tianzhang, thanks for the detailed reply!

Given the information, it seems that NeuralProphet and DeepAR are both strong methods. I have two further points:

  1. In many settings (such as EHR), there exists multiple time series (e.g. one for each patient) -- can NeuralProphet handle this scenario or it can only handle one time series?

  2. Furthermore, the noise distribution is often not Gaussian (e.g. when considering one-hot encoded categorical features over time). Do these methods support categorical features (possibly with one-hot encoding)?

You can also check the papers associated with TimeGAN (already in synthcity) to better understand the exact setting.

On Wed, 5 Apr 2023 at 14:53, Eva Lu Ator @.***> wrote:

Hi Zhaozhi, thanks for your reply! I'd like to provide some additional information regarding the libraries and their capabilities:

1.

All the mentioned models (SARIMA, DeepAR, NeuralProphet) have implementations in GitHub repositories with an MIT license, ensuring dependency and license compatibility. 2.

SARIMA, DeepAR, and (Neural)Prophet are all capable of modeling trend and seasonality. While ARMA is limited to stationary time series, ARIMA and SARIMA can tackle nonstationarity through the integration process, with SARIMA specifically addressing the seasonal component of the series. In NeuralProphet, the time series is decomposed into trend, seasonal, autoregressive (modeled by an AR-Net), error components (modeled as a Gaussian random variable), etc., making it highly effective for time series with strong seasonality. 3.

Although (Neural)Prophet produces a point prediction (mean response), it also estimates the confidence interval of its prediction. For (S)ARIMA(X), however, even the confidence interval is not available. DeepAR is the most flexible one, allowing modeling time series with other distributions, such as Beta or a Gaussian mixture. However, DeepAR may require multiple related time series in the training set for optimal performance. In practice, (Neural)Prophet generally demonstrates better forecasting performance.

If it suffices to model the noise or error component of a time series with a simple Gaussian distribution, NeuralProphet seems to be the most suitable option. It has good performance and is clearly interpretable thanks to its additive decomposition into trend, seasonality, autoregression, etc. Let me know if you have any other questions or if you would like me to explore more options.

— Reply to this email directly, view it on GitHub https://github.com/vanderschaarlab/synthcity/issues/110#issuecomment-1497527033, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSXAYOXMOFKJ4YTDNSX2T3W7V2PFANCNFSM6AAAAAAUAWNUCA . You are receiving this because you commented.Message ID: @.***>

tztsai commented 1 year ago

Hi Zhaozhi! I would like to address your two points below:

  1. NeuralProphet can handle multiple time series by fitting a global model with a dataset composed of many time series. You can find an example of this in the NeuralProphet tutorial on global modeling.
  2. For the second point, a later release of NeuralProphet now supports Conformal Quantile Regression (CQR), which is an uncertainty quantification method that produces calibrated confidence intervals without any distributional assumptions. You can find more information about how to make probabilistic forecasting using NeuralProphet with CQR in this article.

Additionally, for multivariate time series, e.g. a categorical time series after onehot encoding, DeepAR has a multivariate version DeepVAR, and NeuralProphet can handle multivariate inputs by adding multiple regressors.

Finally I have a question: what is the motivation of the requirement that the AR model is probabilistic? Is it to provide a confidence interval or to synthesize the time series by random sampling? Both models are able to do the former task, but if the goal is to synthesize the data, the model should be able to generate a new series with the same timestamps as the training data. However all these models only have the capacity to generate time series in the future by forecasting.

That is, according to the TimeGAN paper, these models only optimize the ML objective, but do not learn a distribution of the full time series by addressing the GAN objective. 图片

Could you help me clarify this point?