robjhyndman / forecast

Forecasting Functions for Time Series and Linear Models
http://pkg.robjhyndman.com/forecast
1.12k stars 342 forks source link

Different results of nsdiffs function with different versions of the forecast package #721

Closed FedericoBotta closed 6 years ago

FedericoBotta commented 6 years ago

Hello,

I have recently updated the forecast package to the new 8.4 version on one of my computers, and I am still running an older version (8.1) on my personal laptop. I have been having reproducibility issues with the auto.arima function, which was giving me different answers using the same code and data set. At first, I thought that it was due to the use of a different default test in the parameter seasonal.test. I then tried using the 'ocsb' test in both versions and was still having the issue, so I started looking into the auto.arima code in detail and I think I have identified the issue being in the nsdiffs function.

I paste below a minimal example, where I have extracted the relevant code from the auto.arima function in order to reproduce the result. The first call of the nsdiffs function gives the same answer in both versions of the forecast package, whereas the second call, which includes the external regressor data, gives me different answers. In particular:

  1. in forecast version 8.1, this call nsdiffs(XTimeseriesData, test = 'ocsb') returns a 0 as answer

  2. in forecast version 8.4, the same call nsdiffs(XTimeseriesData, test = 'ocsb') returns 1 as answer.

Many thanks!

Federico

Code:

Data <- data.frame(Values = c(237000, 314000, 371000, 194000, 318000, 255000, 233000, 260000, 324000, 262000, 330000, 285000, 238000, 328000, 401000, 189000, 296000, 233000, 211000, 241000, 321000, 276000, 294000, 276000, 227000, 341000, 366000, 212000, 328000, 235000, 238000, 260000, 328000, 314000, 283136, 238658, 230310, 322737, 344243, 182676, 287881, 213130, 240508, 252444, 344242, 279465, 303308, 242826, 201019, 342287, 380938, 182938, 276299, 218576, 227292, 248932, 286314, 267457, 329783, 217743)) StartDate <- as.Date("2013-06-01","%Y-%m-%d") EndDate <- as.Date("2018-05-01","%Y-%m-%d") TimeseriesData <- ts(Data,start=c(year(StartDate),month(StartDate)),end=c(year(EndDate),month(EndDate)),frequency=12) nsdiffs(TimeseriesData, test = 'ocsb')

XregData <- data.frame(Values = c(28, 33, 40, 33, 42, 42, 38, 41, 49, 33, 41, 36, 26, 36, 44, 26, 39, 37, 34, 35, 43, 34, 32, 31, 26, 34, 44, 32, 44, 36, 41, 35, 45, 37, 29, 26, 21, 32, 35, 22, 34, 25, 33, 31, 49, 33, 32, 23, 22, 33, 40, 24, 37, 26, 27, 26, 37, 30, 32, 21))

XTimeseriesData <- TimeseriesData j <- !is.na(TimeseriesData) & !is.na(rowSums(as.matrix(XregData))) XTimeseriesData[j] <- residuals(lm(TimeseriesData~as.matrix(XregData))) nsdiffs(XTimeseriesData, test = 'ocsb')

robjhyndman commented 6 years ago

We rewrote the ocsb test including some bug fixes in v8.3.