Time Series Forecasting With Deep Learning: A Survey

0. 論文

Journal/Conference: Royal Society 2020 Title: Time Series Forecasting With Deep Learning: A Survey Authors: Bryan Lim, Stefan Zohren URL: https://arxiv.org/abs/2004.13408

1. どんなもの？

時系列予測モデルのめちゃくちゃ良いサーベイ

2. 先行研究と比べてどこがすごい？

最近の時系列予測の研究に関してちゃんとまとめてある

3. 技術や手法のキモはどこ？

4. どうやって有効だと検証した？

5. 議論はある？

6.次に読むべき論文は？

メモ参照 (いっぱい)

メモ

Hybrid Deep learning modelの開発に焦点を当てる

1 Introduction ・多段先予測や不確か性推定などの技術について紹介・定量モデルとDeep Modelのハイブリッドモデルを分析・将来的な展望：連続時間モデル / 階層モデル

2 Deep Learning Architectures for Time Series Forecasting 1段階先予測について

a. Basic Building Blocks Fig1：NNモデルの例

i. CNN WaveNetなどが代表的 van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, et al. WaveNet: A Generative Model for Raw Audio. arXiv e-prints. 2016 Sep;p. arXiv:1609.03499. 24 Bai S, Zico Kolter J, Koltun V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv e-prints. 2018;p. arXiv:1803.01271. 25 Borovykh A, Bohte S, Oosterlee CW. Conditional Time Series Forecasting with Convolutional Neural Networks. arXiv e-prints. 2017;p. arXiv:1703.04691. 10 特徴・時間不変性の仮定 (time-invariant) ・lookback windowの入力のみを考慮・カーネルサイズkの調整の難しさ (ARモデルと同等の意味を持つ)

Dilated Convolutions：WaveNetなどで用いられているCNNの拡張 d_lという層固有の拡張率を定義することで，層ごとに取り組む情報の解像度を調整する (層によって取り組む時間が異なるような形となる)

ⅱ：RNN 無限のLookbackウィンドウは勾配爆発などの問題を引き起こしやすい → GRUやLSTMなどの開発ベイジアンフィルタとの類似性：時間経過とともに再帰的に更新される潜在状態を維持するという点は同じ状態遷移 + 訂正のステップを通して，潜在状態内の統計量を更新していく Lim B, Zohren S, Roberts S. Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction. In: International Joint Conference on Neural Networks (IJCNN); 2020. .

ⅲ：Attention Mechanisms Fan C, Zhang Y, Pan Y, Li X, Zhang C, Yuan R, et al. Multi-Horizon Time Series Forecasting with Temporal Attention Learning. In: Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining (KDD); 2019. Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang YX, et al. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting. In: Advances in Neural Information Processing Systems (NeurIPS); 2019. . Lim B, Arik SO, Loeff N, Pfister T. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. arXiv e-prints. 2019;p. arXiv:1912.09363. 利点・直前の重要なイベントに着目できる (休日性) ・レジーム固有の時間的ダイナミクスを学習が可能？ (temporal fusion transformersを参照)

ⅳ：Outputs and Loss functions 点予測と確率的予測における出力

・点予測最小二条誤差が一般的だよねー

・確率的出力既知の分布のパラメータを生成する方法 Wen R, Torkkola K. Deep Generative Quantile-Copula Models for Probabilistic Forecasting. In: ICML Time Series Workshop; 2019. . Salinas D, Flunkert V, Gasthaus J. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. arXiv e-prints. 2017;p. arXiv:1704.04110. Rangapuram SS, Seeger MW, Gasthaus J, Stella L, Wang Y, Januschowski T. Deep State Space Models for Time Series Forecasting. In: Advances in Neural Information Processing Systems (NIPS); 2018. .

b Multi-horizon forecasting models ⅰ：iterative methods モンテカルロ推定を用いて予測を生成外部情報をうまく使えないという問題も

ⅱ：Direct Methods Encoder-Decoderなどの活用 Fan C, Zhang Y, Pan Y, Li X, Zhang C, Yuan R, et al. Multi-Horizon Time Series Forecasting with Temporal Attention Learning. In: Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining (KDD); 2019. Lim B, Arik SO, Loeff N, Pfister T. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. arXiv e-prints. 2019;p. arXiv:1912.09363. Wen R, et al. A Multi-Horizon Quantile Recurrent Forecaster. In: NIPS 2017 Time Series Workshop; 2017. .

3 Incorporating Domain Knowledge with Hybrid Models !!!! Deepモデルの問題・Overfittingが生じやすい・どのように入力時に前処理されるか重要である Makridakis S, Spiliotis E, Assimakopoulos V. Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLOS ONE. 2018 03;13(3):1–26. → 古典的なモデルとのハイブリッドの手法の提案 → 小規模のデータセットに特に有用 (Rangapuram SS, Seeger MW, Gasthaus J, Stella L, Wang Y, Januschowski T. Deep State Space Models for Time Series Forecasting. In: Advances in Neural Information Processing Systems (NIPS); 2018. .)

例 M4cpompetitionでの優勝 Smyl S. A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting. 2020;36(1):75 – 85. M4 Competition.

主に2つの観点から組み合わされる・非確率的パラメトリックモデルの時間変化パラメータをエンコードするため Smyl S. A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting. International Journal of Forecasting. 2020;36(1):75 – 85. M4 Competition. Holt-Winters exponential smoothingをkぅ見合わせて季節性成分とmultiplicative levelをDeep learning モデルの出力と組み合わせる

m B, Zohren S, Roberts S. Enhancing Time-Series Momentum Strategies Using Deep Neural Networks. The Journal of Financial Data Science. 2019;. Binkowski M, Marti G, Donnat P. Autoregressive Convolutional Neural Networks for Asynchronous Time Series. In: Proceedings of the International Conference on Machine Learning (ICML); 2018.

・確率的モデルの分布パラメータを生成するため線形状態空間モデルの時間的変化するパラメータをエンコードしてカルマンフィルタリング方程式を介して推論：Rangapuram SS, Seeger MW, Gasthaus J, Stella L, Wang Y, Januschowski T. Deep State Space Models for Time Series Forecasting. In: Advances in Neural Information Processing Systems (NIPS); 2018. . ガウス過程：Wang Y, Smola A, Maddix D, Gasthaus J, Foster D, Januschowski T. Deep Factors for Forecasting. In: Proceedings of the International Conference on Machine Learning (ICML); 2019. . Grover A, Kapoor A, Horvitz E. A Deep Hybrid Model for Weather Forecasting. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD); 2015. .

4 Facilitating Decision Support Using DNN 意思決定支援を容易にするために拡張された2つの方向性 a. Interpretability with time series data なぜそのように予測するのかを解釈するために

・Techniques for Post-hoc Interpretability 入力と出力の間にinterpretable surrogate modelを挿入 (surrogate model? ) ex. LIME：インスタンス固有の線形モデルを適合することで関連する特徴を識別？：Ribeio M, Singh S, Guestrin C. "Why Should I Trust You?" Explaining the Predictions of Any Classifier. In: KDD; 2016. . SHAP：重要な特徴を識別するために協調ゲーム理論の技術を利用： Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Advances in Neural Information Processing Systems (NIPS); 2017. .

勾配ベースの手法：どの入力特徴が損失関数に影響を与えているか？ Saliency maps Simonyan K, Vedaldi A, Zisserman A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv e-prints. 2013;p. arXiv:1312.6034. Siddiqui SA, Mercier D, Munir M, Dengel A, Ahmed S. TSViz: Demystification of Deep Learning Models for Time-Series Analysis. IEEE Access. 2019;7:67027–67040. 影響関数 Koh PW, Liang P. Understanding Black-box Predictions via Influence Functions. In: Proceedings of the International Conference on Machine Learning(ICML; 2017.

・Inherent Interpretability with Attention Weights Lim B, Arik SO, Loeff N, Pfister T. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. arXiv e-prints. 2019;p. arXiv:1912.09363.

b Counterfactual prediction & causal inference over time 時系列 + 交絡作用について Yoon J, Jordon J, van der Schaar M. GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets. In: International Conference on Learning Representations (ICLR); 2018. . Hartford J, Lewis G, Leyton-Brown K, Taddy M. Deep IV: A Flexible Approach for Counterfactual Prediction. In: Proceedings of the 34th International Conference on Machine Learning (ICML); 2017. . Alaa AM, Weisz M, van der Schaar M. Deep Counterfactual Networks with Propensity Dropout. In: Proceedings of the 34th International Conference on Machine Learning (ICML); 2017.

損失関数の設計に基づいて依存交絡因子を調整してDLモデルを訓練？ Lim B, Alaa A, van der Schaar M. Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks. In: NeurIPS; 2018. . Li R, Shahn Z, Li J, Lu M, Chakraborty P, Sow D, et al. G-Net: A Deep Learning Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes. arXiv e-prints. 2020;p. arXiv:2003.10551. 患者の病歴バランスのとれた表現を学習するためにドメインの敵対的訓練を採用: Bica I, Alaa AM, Jordon J, van der Schaar M. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. In: International Conference on Learning Representations(ICLR); 2020. .

Conclusion; 時系列予測の現在の限界・一定の間隔で離散化する必要があり、オブザベーションが欠落していたり、ランダムな間隔で到着したりするようなデータセットを予測することは困難・時系列の多くは階層構造を持つ：同じ地域の製品販売が共通の傾向によって影響を受ける可能性など→階層を明示的に利用したアーキテクチャの開発はまだ進んでない

hkefka385 / paper_reading