thuml / Nonstationary_Transformers

Code release for "Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting" (NeurIPS 2022), https://arxiv.org/abs/2205.14415
MIT License
476 stars 77 forks source link

Series Stationarization vs. Normalization #9

Closed StatMixedML closed 1 year ago

StatMixedML commented 1 year ago

Dear authors,

Thanks for making the code available of your interesting paper.

When I was going through the paper, I found the usage of the term "Stationarization" a little confusing. In Section 3.1 you mention

To attenuate the non-stationarity of each input series, we conduct normalization on the temporal dimension (...)

Yet, I am not sure if normalization actually removes any of the non-stationarity. Let me illustrate this using the famous Air Passenger dataset, which is shown below.

image

As a very sloppy definition "A time series is considered weakly stationary, if it has no trend or seasonality, constant variance over time, and a consistent auto-correlation over time". From the above plot, none of this is true for the data. This is confirmed by the Augmented-Dickey-Fuller (ADF) test

Augmented Dickey-Fuller Test

data:  Monthly Airline Passenger Numbers
Dickey-Fuller = -1.5094, Lag order = 12, p-value = 0.7807
alternative hypothesis: stationary

as well as an Auto-ARIMA

ARIMA(2,1,1)(0,1,0)[12]

If the series is normalized via (y-mu)/sigma, most of the factors that contribute to the non-stationarity are still present in the data, e.g., trend, seasonality, etc. as shown in the following plot.

image

The fact that normalization does not remove or attenuate non-stationarity is also reflected by the ADF

Augmented Dickey-Fuller Test

data:  Monthly Airline Passenger Numbers Normalized
Dickey-Fuller = -1.5094, Lag order = 12, p-value = 0.7807
alternative hypothesis: stationary

as well as an Auto-ARIMA

ARIMA(2,1,1)(0,1,0)[12]

In fact, the ADF values are exactly the same as for the non-transformed data.

So instead of saying "Stationarization", one should rather use "Normalization". In fact, if the above series is supposed to be stationary, then one would need to use differencing + seasonal differencing as suggested by the Auto-ARIMA.

Have you tried to use (p,P) differencing for the evaluation of your models and compare the results to the proposed instance normalization? Also, using the residuals from a STL-decomposition might get you closer to a stationary series.

Looking very much forward to the discussion.

WenWeiTHU commented 1 year ago

Hello, thanks for your interest.

In fact, normalization on the whole series has No effect on ADF and stationarity. However, the proposed instance normalization in our paper is conducted on every model’s input which is obtained by shifting windows on the whole time series, as shown in the following plot:

image

Since non-stationarity is characterized by continuous change of temperal statistics and distribution. The instance normaliztion makes time series in adjacent windows follow the same std and mean, which can highly attenuate the non-stationarity (decreased ADF) of the the whole series:

image (the Stationaried means processing by our proposed instance normalization)

StatMixedML commented 1 year ago

Thanks for the detalied clarification. I wasn't aware that you are using a sliding window normalization.