robjhyndman / forecast

Forecasting Functions for Time Series and Linear Models
http://pkg.robjhyndman.com/forecast
1.11k stars 341 forks source link

auto.arima can't find a suitable model #102

Closed RobinGroenevelt2 closed 9 years ago

RobinGroenevelt2 commented 9 years ago

I have 2 years of data and auto.arima can't fit a model to it. I have this in a loop where I'm building thousands of models and this specific set of data poses a problem for some strange reason.

It's the parameter D=1 that seems to mess things up. I really need to keep it in place.

Here's the data

a Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2013 5723.375 5663.500 5944.000 6951.385 6950.875 6744.611 7684.353 7759.700 6786.650 6472.947 5382.917 5546.909 2014 5937.556 5261.500 5757.700 5629.562 7849.375 6645.263 7488.045 7500.409 6476.167 6253.571 6414.000 5631.545 b Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2013 2070.094 2076.213 2082.489 2088.902 2095.434 2102.066 2108.782 2115.575 2122.439 2129.369 2136.361 2143.409 2014 2150.508 2157.654 2164.842 2172.066 2179.321 2186.602 2193.906 2201.227 2208.564 2215.912 2223.268 2230.630 fit1 <- auto.arima(a, xreg=b, d=1, D=1) Error in auto.arima(a, xreg = b, d = 1, D = 1) : No suitable ARIMA model found

robjhyndman commented 9 years ago

Reproducible code showing the problem:

library(forecast)
a <- ts(c(5723.375,5663.500,5944.000,6951.385,6950.875,6744.611,7684.353,7759.700,
  6786.650, 6472.947,5382.917,5546.909,5937.556, 5261.500, 5757.700, 5629.562, 7849.375, 
  6645.263, 7488.045, 7500.409, 6476.167, 6253.571, 6414.000, 5631.545),start=2013,f=12)
b <- ts(c(2070.094,2076.213,2082.489,2088.902,2095.434,2102.066,2108.782,2115.575,
  2122.439, 2129.369,2136.361,2143.409,2150.508,2157.654,2164.842,2172.066,2179.321,
  2186.602, 2193.906,2201.227,2208.564,2215.912,2223.268,2230.630),start=2013,f=12)
fit1 <- auto.arima(a, xreg=b, d=1, D=1)
# But manual differencing works:
x=diff(diff(b,12))
y=diff(diff(a,12))
auto.arima(y,xreg=x,d=0,D=0)
RobinGroenevelt2 commented 9 years ago

Hi Rob,

Thanks a lot for the very quick reply! Interesting. so I guess it’s a little bug but with an easy solution. Works for me!

On 7 févr. 2015, at 00:08, Rob J Hyndman notifications@github.com wrote:

Reproducible code showing the problem:

library(forecast) a <- ts(c(5723.375,5663.500,5944.000,6951.385,6950.875,6744.611,7684.353,7759.700,6786.650,6472.947,5382.917,5546.909,5937.556, 5261.500, 5757.700, 5629.562, 7849.375, 6645.263, 7488.045, 7500.409, 6476.167, 6253.571, 6414.000, 5631.545),start=2013,f=12) b <- ts(c(2070.094,2076.213,2082.489,2088.902,2095.434,2102.066,2108.782,2115.575,2122.439,2129.369,2136.361,2143.409,2150.508,2157.654,2164.842,2172.066,2179.321,2186.602,2193.906,2201.227,2208.564,2215.912,2223.268,2230.630),start=2013,f=12) fit1 <- auto.arima(a, xreg=b, d=1, D=1)

But manual differencing works:

x=diff(diff(b,12)) y=diff(diff(a,12)) auto.arima(y,xreg=x,d=0,D=0)

— Reply to this email directly or view it on GitHub.

RobinGroenevelt2 commented 9 years ago

If I forecast the forecasted data will of course be based on the twice differenced data, so this will need to be corrected. Any quick tip on how to back-transform the forecasted data back to the original scale?

On 07 Feb 2015, at 00:08, Rob J Hyndman notifications@github.com wrote:

Reproducible code showing the problem:

library(forecast) a <- ts(c(5723.375,5663.500,5944.000,6951.385,6950.875,6744.611,7684.353,7759.700,6786.650,6472.947,5382.917,5546.909,5937.556, 5261.500, 5757.700, 5629.562, 7849.375, 6645.263, 7488.045, 7500.409, 6476.167, 6253.571, 6414.000, 5631.545),start=2013,f=12) b <- ts(c(2070.094,2076.213,2082.489,2088.902,2095.434,2102.066,2108.782,2115.575,2122.439,2129.369,2136.361,2143.409,2150.508,2157.654,2164.842,2172.066,2179.321,2186.602,2193.906,2201.227,2208.564,2215.912,2223.268,2230.630),start=2013,f=12) fit1 <- auto.arima(a, xreg=b, d=1, D=1)

But manual differencing works:

x=diff(diff(b,12)) y=diff(diff(a,12)) auto.arima(y,xreg=x,d=0,D=0)

— Reply to this email directly or view it on GitHub https://github.com/robjhyndman/forecast/issues/102#issuecomment-73330173.

RobinGroenevelt2 commented 9 years ago

Hello Rob,

I’m trying to implement things the manual way as you suggest but unfortunately things aren’t as simple as that as they seem.

I’ve written a function to do the back-back-transformed to the original values so that part is ok.

The parameters and models found by the auto.arima function and the manual differencing method are not always the same. For the model I’ll always need to add the d=1 and D=1 parameters to align them. However, I also have situations where auto.arima gives me (0,0,0)(0,0,0) as the best model, which means there are some additional tests done in the forecast model that I’ll have to do manually.

All other parameters obtained by the fitted models are different so the model data I’ve gathered and calculated over the last few months can be thrown away (I’ve been using auto.arima to calculate the traffic between millions of cities).

Another issue that I’m conscious of is that all the future xreg variables that I feed into forecasts will also have to be appended with the 13 historical xreg values so that the differenced values for the future begin correctly. When calculating the back-back-transformation I’ll also have to take this appended data into account.

I’m using auto.arima a LOT in different places so it means that I’m going to spend days correcting code for something that worked very well before.

Any idea if an easy fix is possible to the auto.arima function to correctly forecast? There’s nothing strange with the data I provided you so it appears to be quite a major bug that needs fixing anyhow. Any idea on the time frame to get this solved? Depending on your answer I’ll know if I can wait for it or now.

Many thanks in advance!!!

Robin

On 09 Feb 2015, at 12:07, Robin Groenevelt robingroenevelt@gmail.com wrote:

If I forecast the forecasted data will of course be based on the twice differenced data, so this will need to be corrected. Any quick tip on how to back-transform the forecasted data back to the original scale?

On 07 Feb 2015, at 00:08, Rob J Hyndman <notifications@github.com mailto:notifications@github.com> wrote:

Reproducible code showing the problem:

library(forecast) a <- ts(c(5723.375,5663.500,5944.000,6951.385,6950.875,6744.611,7684.353,7759.700,6786.650,6472.947,5382.917,5546.909,5937.556, 5261.500, 5757.700, 5629.562, 7849.375, 6645.263, 7488.045, 7500.409, 6476.167, 6253.571, 6414.000, 5631.545),start=2013,f=12) b <- ts(c(2070.094,2076.213,2082.489,2088.902,2095.434,2102.066,2108.782,2115.575,2122.439,2129.369,2136.361,2143.409,2150.508,2157.654,2164.842,2172.066,2179.321,2186.602,2193.906,2201.227,2208.564,2215.912,2223.268,2230.630),start=2013,f=12) fit1 <- auto.arima(a, xreg=b, d=1, D=1)

But manual differencing works:

x=diff(diff(b,12)) y=diff(diff(a,12)) auto.arima(y,xreg=x,d=0,D=0)

— Reply to this email directly or view it on GitHub https://github.com/robjhyndman/forecast/issues/102#issuecomment-73330173.

RobinGroenevelt2 commented 9 years ago

The manual difference method gives me a different ARIMA model (see the code underneath). Is this normal?

a <- ts(c( 20361, 23402, 27260, 26759, 28577, 31771, 39563, 36614, 33776, 42961, 36810, 35735, 37510, 36382, 40935, 38454, 40119, 41388, 43001, 43384, 40705, 41663, 30697, 29725, 31691, 32932, 35654, 31948, 36172, 36295, 36178, 32651, 33121, 35368, 29610, 31267, 29254, 33309, 34285, 32323, 35276, 34715, 36023, 34624, 32717, 32520, 29119, 26359, 25457, 23702, 29458, 31271, 35035, 33379, 34068, 30936, 31389, 32368, 27075, 26113, 27523, 23618, 32458, 31980, 35593, 34411, 33816, 31414, 31242, 27968, 26005, 21807, 22641, 23580, 25391, 27405, 29603, 29976, 31908, 28117, 27868, 29425, 24699, 35360, 22343, 30345, 25735, 22898, 26504, 26570, 29190, 28676, 27551, 28929, 25562, 14742, 32688, 30756, 23886, 22874, 28218, 26179, 28238, 29475, 25600, 25514, 17745, 30821, 25016, 27339, 19990, 22679, 26465, 25691, 22347, 21906, 25337, 25358, 28778, 29549, 24465, 22329, 28082, 33589, 32239, 30911, 30807, 28374, 28075, 29728, 25774, 24952, 26590, 30352, 34419, 36408, 38525, 37891, 38739, 38103, 36792, 35641, 26435, 27290, 25631, 25065, 27558, 31452, 34153, 33784, 35481, 35028, 32733, 32044, 28070, 28356),start=2002,f=12)

b <- ts(c( 36878, 34004,37810, 40436, 41840, 57814, 60166, 60045, 58070, 59386, 53950, 55906, 55842, 50448,55640, 48224, 51963, 48594, 49828, 50530, 52472, 52926, 44258, 44814, 45802, 41804,45780, 42226, 44123, 42206, 43611, 43715, 49192, 51412, 46290, 49868, 48950, 45368,50366, 45791, 48038, 47028, 47782, 48823, 45816, 46999, 39796, 39688, 40274, 36480,42030, 46670, 48800, 46864, 48190, 48186, 46602, 48310, 40984, 39653, 41060, 37430,42844, 45686, 47224, 60522, 61918, 61962, 60901, 46554, 39018, 39622, 39498, 35868,40370, 45478, 47526, 45512, 46141, 46026, 45240, 45946, 39029, 39270, 41323, 42588,46619, 40852, 42449, 40852, 45377, 46297, 41072, 43005, 41750, 41900, 39235, 36716,40908, 40018, 41354, 40018, 43020, 44582, 40714, 41978, 37764, 40014, 39182, 36128,40720, 40723, 42116, 40732, 43392, 44014, 40802, 44989, 38382, 39278, 39101, 36044,41532, 42606, 46146, 42678, 43050, 42735, 42513, 44121, 41153, 44025, 44563, 40634,44655, 46351, 47841, 46385, 47314, 47150, 46269, 46009, 34119, 35734, 34676, 33374,35978, 38874, 40220, 39603, 41919, 42428, 40011, 40749, 36144, 37051),start=2002,f=12)

fit1 <- auto.arima(a, xreg=b, d=1, D=1)

fit1$arma

[1] 2 2 2 1 12 1 1

fit1$coef

ar1 ar2 ma1 ma2 sar1 sar2 sma1 b

-0.3913285 0.2105308 -0.2317791 -0.5168453 -0.1220176 -0.0939800 -0.8929759 0.2347787

x=diff(diff(b,12)) y=diff(diff(a,12)) fit2 <- auto.arima(y,xreg=x,d=0,D=0)

fit2$arma

[1] 2 1 1 1 12 0 0

fit2$coef

ar1 ar2 ma1 sar1 sma1 x

0.308494827 0.007214615 -0.869739902 -0.212493246 -0.762069455 0.308460469

RobinGroenevelt2 commented 9 years ago

Here’s a situation where manual differencing also doesn’t work. There are 18 time points of data, which is very little for monthly forecasting, yet something that should be possible (even if it’s not precise). In this case neither the usual method with auto.arima(… , d=1, D=1) nor the manual differenced version work. Is there any workaround possible? (except manually constructing models which isn’t an option for me considering the large of models that need to get built automatically).

x Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2013 70618455 56614109 57685501 59574147 50066233 49913675 2014 49833848 43343409 52947918 52284884 58682628 67889116 74458852 58812840 56598136 63365526 52315527 45732365 y var1 var2 Jul 2013 14536.11 82023 Aug 2013 14587.66 80944 Sep 2013 14639.74 72153 Oct 2013 14692.39 69983 Nov 2013 14745.68 60999 Dec 2013 14799.64 66582 Jan 2014 14854.34 56089 Feb 2014 14909.82 49493 Mar 2014 14966.12 61007 Apr 2014 15023.31 65795 May 2014 15081.43 74615 Jun 2014 15140.53 76528 Jul 2014 15200.65 82615 Aug 2014 15261.74 82078 Sep 2014 15323.74 67510 Oct 2014 15386.59 65197 Nov 2014 15450.22 59211 Dec 2014 15514.59 64446

a = diff(diff(y,12)) b = diff(diff(x,12)) fit <- auto.arima(b,xreg=a,d=0,D=0)

On 12 Feb 2015, at 16:09, Robin Groenevelt robingroenevelt@gmail.com wrote:

On 07 Feb 2015, at 00:08, Rob J Hyndman notifications@github.com wrote:

Reproducible code showing the problem:

library(forecast) a <- ts(c(5723.375,5663.500,5944.000,6951.385,6950.875,6744.611,7684.353,7759.700,6786.650,6472.947,5382.917,5546.909,5937.556, 5261.500, 5757.700, 5629.562, 7849.375, 6645.263, 7488.045, 7500.409, 6476.167, 6253.571, 6414.000, 5631.545),start=2013,f=12) b <- ts(c(2070.094,2076.213,2082.489,2088.902,2095.434,2102.066,2108.782,2115.575,2122.439,2129.369,2136.361,2143.409,2150.508,2157.654,2164.842,2172.066,2179.321,2186.602,2193.906,2201.227,2208.564,2215.912,2223.268,2230.630),start=2013,f=12) fit1 <- auto.arima(a, xreg=b, d=1, D=1)

But manual differencing works:

x=diff(diff(b,12)) y=diff(diff(a,12)) auto.arima(y,xreg=x,d=0,D=0)

— Reply to this email directly or view it on GitHub https://github.com/robjhyndman/forecast/issues/102#issuecomment-73330173.

robjhyndman commented 9 years ago

It turns out that this is a problem with stats::arima(), not with auto.arima(). stats.arima() will not allow a model with seasonal differencing using so few observations, even though it is possible to fit one.

The differences between the manual differencing models and those obtained allowing auto.arima() to do the differencing are because of the different ways the initialization is handled.

Summary: this is not a problem with the forecast package, although it would be possible for the forecast package to return a null model in such circumstances.

Solution: write your code to work around such problems with short series.

cristianvaldez commented 5 years ago

@robjhyndman how can we avoid "Error: Evaluation error: No suitable ARIMA model found"? I need it to evacuate without causing problems, because it is making my forecast pipeline to fail :(

robjhyndman commented 5 years ago

Please create a separate issue with a reproducible example for when it occurs.