robjhyndman / forecast

Forecasting Functions for Time Series and Linear Models
http://pkg.robjhyndman.com/forecast
1.13k stars 342 forks source link

bug in auto.arima? #831

Closed Apratimguha closed 5 years ago

Apratimguha commented 5 years ago

Using the dataset "strikes" from the package "fma", kpss.test gives a p.value of 0.0585. So it does not reject the null of stationarity. Yet, the best model fitted by auto.arima is ARIMA(0,1,0). I wonder what is the reason for this inconsistency. [Output is given below]

kpss.test(strikes)

KPSS Test for Level Stationarity

data: strikes KPSS Level = 0.44329, Truncation lag parameter = 2, p-value = 0.0585

auto.arima(strikes) Series: strikes ARIMA(0,1,0)

sigma^2 estimated as 359750: log likelihood=-226.65 AIC=455.3 AICc=455.45 BIC=456.67

mitchelloharawild commented 5 years ago

The forecast package's ndiffs() function uses urca for the tests with legacy tseries default arguments.

library(urca)
kpss_result <- ur.kpss(fma::strikes, use.lag = trunc(3*sqrt(length(fma::strikes))/13))
summary(kpss_result)
#> 
#> ####################### 
#> # KPSS Unit Root Test # 
#> ####################### 
#> 
#> Test is of type: mu with 1 lags. 
#> 
#> Value of test-statistic is: 0.6186 
#> 
#> Critical value for a significance level of: 
#>                 10pct  5pct 2.5pct  1pct
#> critical values 0.347 0.463  0.574 0.739

Created on 2019-11-09 by the reprex package (v0.2.1)

Apratimguha commented 5 years ago

Hi, the two packages seem to give very different p-values for the same calculated test statistic. Is there any further reading available on it?

mitchelloharawild commented 5 years ago

The two packages give the same test statistic:

tseries::kpss.test(fma::strikes)
#>  KPSS Test for Level Stationarity
#> 
#> data:  fma::strikes
#> KPSS Level = 0.44329, Truncation lag parameter = 2, p-value =
#> 0.0585
urca::ur.kpss(fma::strikes)
#> 
#> ####################################### 
#> # KPSS Unit Root / Cointegration Test # 
#> ####################################### 
#> 
#> The value of the test statistic is: 0.4433

Created on 2019-11-09 by the reprex package (v0.2.1)

The truncation lag parameter used in forecast is the legacy short lag value of tseries::kpss.test which was trunc(3*sqrt(length(fma::strikes))/13)

Apratimguha commented 5 years ago

My concern was more with the p-values. But now I see the source of the difference: the two defaults are different.

Is there a way to change the options in the functions called by auto.arima?

mitchelloharawild commented 5 years ago

It is possible to specify a particular d and D for auto.arima() by setting their arguments to the desired number of differences. This allows you to write your own options for the test.

The fable package allows you to more directly modify the test used for selecting differences via unitroot_spec = unitroot_options().