tidyverts / fable

Tidy time series forecasting
https://fable.tidyverts.org
GNU General Public License v3.0
559 stars 65 forks source link

Inconsistent naming of ARIMA models #139

Closed robjhyndman closed 5 years ago

robjhyndman commented 5 years ago
library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
#> Registered S3 method overwritten by 'rvest':
#>   method            from
#>   read_xml.response xml2
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following object is masked from 'package:dplyr':
#> 
#>     id
library(fable)
#> Loading required package: fablelite

fit <- as_tsibble(WWWusage) %>%
  model(
    mod1 = ARIMA(value),
    mod2 = ARIMA(value ~ pdq(d=1))
  )

fit %>% select(mod1) %>% report()
#> Series: value 
#> Model: LM w/ ARIMA(0,0,0) errors 
#> 
#> Coefficients:
#>       constant
#>       137.0800
#> s.e.    3.9799
#> 
#> sigma^2 estimated as 1600:  log likelihood=-510.28
#> AIC=1024.56   AICc=1024.81   BIC=1032.37
fit %>% select(mod2) %>% report()
#> Series: value 
#> Model: ARIMA(1,1,1) 
#> 
#> Coefficients:
#>          ar1     ma1
#>       0.6504  0.5256
#> s.e.  0.0842  0.0896
#> 
#> sigma^2 estimated as 9.995:  log likelihood=-254.15
#> AIC=514.3   AICc=514.94   BIC=527.28

Created on 2019-04-29 by the reprex package (v0.2.1)

robjhyndman commented 5 years ago

Also, the first model above is incorrect. It should be an ARIMA(1,1,1).

forecast::auto.arima(WWWusage)
#> Series: WWWusage 
#> ARIMA(1,1,1) 
#> 
#> Coefficients:
#>          ar1     ma1
#>       0.6504  0.5256
#> s.e.  0.0842  0.0896
#> 
#> sigma^2 estimated as 9.995:  log likelihood=-254.15
#> AIC=514.3   AICc=514.55   BIC=522.08

Created on 2019-04-29 by the reprex package (v0.2.1)

mitchelloharawild commented 5 years ago

Incorrect model selection is an issue with feasts: Moved to https://github.com/tidyverts/feasts/issues/40

How exactly are the models named inconsistently? The first model includes a constant, which is currently being treated/displayed as a LM term (is this not how we include constants in the presence of differences?).

Is the inclusion of the constant substantially different (in implementation, not resulting forecasts) for choices of d and D? That is to say, is including a xreg intercept column different from calling arima(..., include.mean = TRUE).

The model summary (LM w/ ARIMA(0,0,0) errors) can/should be improved... Thoughts on this?

robjhyndman commented 5 years ago

Your bulleted scheme looks ok to me. mod1 is a constant with d+D==0, so according to this scheme it should be "ARIMA(p,d,q)(P,D,Q)[m] w/ mean".

mitchelloharawild commented 5 years ago
library(tidyverse)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following object is masked from 'package:dplyr':
#> 
#>     id
library(fable)
#> Loading required package: fablelite

fit <- as_tsibble(WWWusage) %>%
  model(
    mod1 = ARIMA(value),
    mod2 = ARIMA(value ~ pdq(d=1))
  )

fit %>% select(mod1) %>% report()
#> Series: value 
#> Model: ARIMA(0,0,0) w/ mean 
#> 
#> Coefficients:
#>       constant
#>       137.0800
#> s.e.    3.9799
#> 
#> sigma^2 estimated as 1600:  log likelihood=-510.28
#> AIC=1024.56   AICc=1024.81   BIC=1032.37
fit %>% select(mod2) %>% report()
#> Series: value 
#> Model: ARIMA(1,1,1) 
#> 
#> Coefficients:
#>          ar1     ma1
#>       0.6504  0.5256
#> s.e.  0.0842  0.0896
#> 
#> sigma^2 estimated as 9.995:  log likelihood=-254.15
#> AIC=514.3   AICc=514.94   BIC=527.28

Created on 2019-05-01 by the reprex package (v0.2.1)