The Effectiveness of Discretization in Forecasting: An Empirical Study on Neural Time Series Models

0. 論文

Journal/Conference: Arxive 2020 Title: The Effectiveness of Discretization in Forecasting: An Empirical Study on Neural Time Series Models Authors: Stephan Rabanser, Tim Januschowski, Valentin Flunkert, David Salinas, Jan Gasthaus URL: https://arxiv.org/abs/2005.10111

1. どんなもの？

様々な時系列モデルが提案されているが，その時系列モデルに対して入力の際の変換と出力の際の変換によって精度の変化がどの程度生じるのかを検証特にBinningの効果がどの程度かについて

2. 先行研究と比べてどこがすごい？

最近時系列モデルの前処理や後処理として有用だと言われているBinning処理の効果について細かく検証した点

3. 技術や手法のキモはどこ？

実験の細かさと関連研究の網羅的なまとめ

4. どうやって有効だと検証した？

出力を固定して入力の前処理を変化させた場合どのような効果があるのか？(binの数や次元数など) 同様に入力を固定して出力を変化させた場合どうなのか？を検証

5. 議論はある？

6.次に読むべき論文は？

メモ参照

メモ

関連研究に関するまとめ方が良いね！

Abst モデルに対する入力と出力の変換による効果を検証 NNモデルに入力がカテゴリカル (自然言語など)の時は成功する異なるタイプのデータスケーリングとデータData binning

1 Introduction M4 competitionなどでのNNモデルの強さ複数の時系列にまたがるパターンを抽出しGlobal modelingを構成 (多変量モデルとは異なる？) 1変量時系列予測に焦点を当て，入力出力表現の変換技術の検証＊最近の多変量時系列予測：David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, and Jan Gasthaus. 2019. High-dimensional multivariate forecasting with low-rank Gaussian Copula Processes. In Advances in Neural Information Processing Sys- tems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 6824–6834.

attention-basedの時系列予測 ShiyangLi,XiaoyongJin,YaoXuan,XiyouZhou,WenhuChen,Yu-XiangWang, and Xifeng Yan. 2019. Enhancing the Locality and Breaking the Memory Bot- tleneck of Transformer on Time Series Forecasting. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 5244–5254. Bryan Lim, Sercan Arik, Nicolas Loeff, and Tomas Pfister. 2020. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. In arXiv.

★確率分布を推定するモデル NN + 確率的出力の例・パラメトリック分布/ 混合 Srayanta Mukherjee, Devashish Shankar, Atin Ghosh, Nilam Tathawadekar, Pramod Kompalli, Sunita Sarawagi, and Krishnendu Chaudhury. 2018. AR- MDN: Associative and Recurrent Mixture Density Networks for eRetail Demand Forecasting. CoRR abs/1803.03800 (2018). arXiv:1803.03800 http://arxiv.org/abs/ 1803.03800 David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. 2019. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. International Journal of Forecasting (2019). ・分位グリッド Ruofeng Wen, Kari Torkkola, Balakrishnan Narayanaswamy, and Dhruv Madeka. 2017. A Multi-Horizon Quantile Recurrent Forecaster. arXiv e-prints (Nov 2017). arXiv:stat.ML/1711.11053 ・パラメトリック分位関数モデル Jan Gasthaus,Konstantinos Benidis,Yuy ang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. 2019. Probabilistic Fore- casting with Spline Quantile Function RNNs. In The 22nd International Conference on Artificial Intelligence and Statistics. ・Copura-based David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, and Jan Gasthaus. 2019. High-dimensional multivariate forecasting with low-rank Gaussian Copula Processes. In Advances in Neural Information Processing Sys- tems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 6824–6834. Ruofeng Wen and Kari Torkkola. 2019. Deep Generative Quantile-Copula Models for Probabilistic Forecasting. arXiv e-prints (Jul 2019). arXiv:stat.ML/1907.10697 ・離散化：WaveNet

最近の流行について Christos Faloutsos, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, and Yuyang Wang. 2019. Forecasting Big Time Series: Theory and Practice. In Pro- ceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019.

2 Preliminaries Z: N個の1変量 X: 共変量 negative log-likelihoodの最小化でパラメータを決定モデルのパラメータはデータセットにおいて1つのパラメータを学習されるが，入力・出力変換のためのパラメータは (\theta. \phiなど)は時系列ごとに変動する

3 Methods

3.1 Transformations 3.1.1 Scaling NNなどの収束性に影響 Affine変換で平均でスケーリングやmin-max scaling，standardizationなど

3.1.2 Continuous Transformations Gaussianizing︰David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, and Jan Gasthaus. 2019. High-dimensional multivariate forecasting with low-rank Gaussian Copula Processes. In Advances in Neural Information Processing Sys- tems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 6824–6834.などで提案された変換 Probability integral transformer (PIT): 近似的に一様分布に変換する関数 / 分位ビニング変換

3.1.3 Discretizing Transformations Binning: convert real-valued input to its bucket index とある区間に入っている数値なら一つのindexに変換？再構成関数 (Lloyd-Max algorithmでreal-valuedを推定)

→適切なbin edgeの選択のための2つの戦略・ equally spaced binning：B-2の区間に均等に分割・quantile binning︰累積分布を用いて各binに落ちる点を均等に分割

2つのbinning戦略を検討・Local Absolute Binning (lab)：各時系列を個別にbinning ・Global Relative Binning (grb)：各時系列をスケーリングした後に１つのglobal binningで行う・Hybrid Binning (hyb)：複数のbinningをconcatenateして入力とする

3.2 Output Representations Student-t (st)：student-t分布の形に出力するDeepARで採用されたアプローチ plqs：分位グリッド (Jan Gasthaus,Konstantinos Benidis,Yuy ang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. 2019. Probabilistic Fore- casting with Spline Quantile Function RNNs. In The 22nd International Conference on Artificial Intelligence and Statistics.) categorical distribution：カテゴリカル分布に適応した分布出力

3.3 Model WaveNetm DeepAR, Feed-forward

4 Experiments 時系列確率ライブラリ： GluonTS AlexanderAlexandrov,KonstantinosBenidis,MichaelBohlke-Schneider,Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C Maddix, Syama Rangapu- ram, David Salinas, Jasper Schulz, et al. 2019. GluonTS: Probabilistic Time Series Models in Python. arXiv preprint arXiv:1906.05264 (2019). データセット・M4 forecast ・electricity and traffic ・wiki10k

入力表現を固定させ，出力表現を変化させた場合のパフォーマス効果を調査し，Binningの効果を検証ビニング処理のbin数を1024に，Embedding空間はその4√ (TensorFlow Team. 2017. Introducing TensorFlow Feature Columns. https://developers.googleblog.com/2017/11/introducing- tensorflow- feature- columns.html) 分位ビニングの方が信頼性が高い予測分布の評価・weighted quantile loss ・normalized deviation (ND) 入力を固定し，出力を変化させた場合の結果： Table1 出力を固定し，入力を変化させた場合：Table2 binningなしに入力と出力をスケーリングさせた場合：Table3

5 Discussion ・Output Scaling (Table1) global relative quantile binning + WaveNetで良い結果ただし，DeepArではbinningされていないアウトプット (平均スケーリングなど)の方が良い結果出力のBinningの選択はモデルと同程度重要である

・Input Scaling 入力変換の影響は出力変換と比較して顕著ではないが，local absolute binningは精度が低下する結果に

・Binning resolution effects 出力を1024binに固定した場合で，入力のbinを変化させても精度は変わらない (Fig3-a) 出力のbinの増加はパフォーマス向上をもたらす (Fig3b)

・Embedding size effects Fig3-c

・Global versus local binning\ Globalの方が効果が大きい

・Hybrid versus Single binning Global同士のハイブリッドは精度を向上させる傾向

・Models

6 Related works 大域的なパターンとlocal的なパターンを取得しようとする： Yuyang Wang, Alex Smola, Danielle Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski. 2019. Deep factors for forecasting. In International Conference on Machine Learning. 6607–6617. Rajat Sen, Hsiang-Fu Yu, and Inderjit S Dhillon. 2019. Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting. In Advances in Neural Information Processing Systems. 4838–4847.

hkefka385 / paper_reading