tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Trouble using forecast::auto.arima() with fabletools::model() - "no ARIMA models to choose from" #392

Closed kenahoo closed 5 months ago

kenahoo commented 5 months ago

Hi,

I'm having trouble using forecast::auto.arima() to determine parameters, and then using fabletools::model() to train based on those parameters. Certain data sets seem to come up with parameters that model() doesn't like. For these data sets, it dies with the error There are no ARIMA models to choose from after imposing the `order_constraint`, please consider allowing more models.

I've attached a working example and a non-working example.

training-noerror.rds.zip training-error.rds.zip

(I had to convert the .rds files to .zip or else it seems that GitHub doesn't let them be uploaded - each zip file just has the single .rds file indicated.)

Here's my test code as a reprex:

suppressPackageStartupMessages({
  library(dplyr)
  library(tsibble)
  library(fable)
})
# We also use 'forecast' and 'fabletools' below

train_model <- function(X, y) {
  #use auto.arima to get model specification rather than using fable. auto.arima is generally much faster
  arima_fit <- forecast::auto.arima(
    y=ts(y %>% as.ts(), frequency = 365.25),
    xreg=X %>% as.data.frame() %>% select(-date) %>% as.matrix()
  )
  cat("##### Auto ARIMA fit:", "\n")
  print(arima_fit)

  pdq_form <- sprintf('pdq(%s,%s,%s)', arima_fit$arma[1], arima_fit$arma[6], arima_fit$arma[2])
  arima_form <- formula(paste('target ~ x1 + ', pdq_form))

  cat("\narima_formula: ", format(arima_form), "\n")

  training <- inner_join(X, y, by="date") %>%
    as_tsibble(index=date)

  fabletools::model(training, ARIMA(arima_form))
}

# Works:
training <- readRDS('~/Downloads/training-noerror.rds')
fit <- train_model(X = training %>% select(x1), y = training %>% select(target))
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> ##### Auto ARIMA fit: 
#> Series: ts(y %>% as.ts(), frequency = 365.25) 
#> Regression with ARIMA(2,1,2) errors 
#> 
#> Coefficients:
#>          ar1      ar2      ma1     ma2      x1
#>       0.7686  -0.1705  -1.2373  0.3404  0.0031
#> s.e.  0.6741   0.2510   0.6745  0.5690  0.0004
#> 
#> sigma^2 = 0.2235:  log likelihood = -488.13
#> AIC=988.26   AICc=988.37   BIC=1015.83
#> 
#> arima_formula:  target ~ x1 + pdq(2, 1, 2)

# Fails:
training2 <- readRDS('~/Downloads/training-error.rds')
fit <- train_model(X = training2 %>% select(x1), y = training2 %>% select(target))
#> ##### Auto ARIMA fit: 
#> Series: ts(y %>% as.ts(), frequency = 365.25) 
#> Regression with ARIMA(5,1,3) errors 
#> 
#> Coefficients:
#>           ar1      ar2     ar3     ar4     ar5      ma1     ma2      ma3     x1
#>       -0.3714  -0.4704  0.3603  0.0888  0.0572  -0.0890  0.0609  -0.7744  2e-03
#> s.e.   0.0913   0.0891  0.0676  0.0487  0.0464   0.0835  0.0906   0.0733  5e-04
#> 
#> sigma^2 = 0.3704:  log likelihood = -670.28
#> AIC=1360.55   AICc=1360.86   BIC=1406.49
#> 
#> arima_formula:  target ~ x1 + pdq(5, 1, 3)
#> Warning: 1 error encountered for ARIMA(arima_form)
#> [1] There are no ARIMA models to choose from after imposing the `order_constraint`, please consider allowing more models.

Created on 2024-01-29 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.3 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz UTC #> date 2024-01-29 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> anytime 0.3.9 2020-08-27 [1] CRAN (R 4.3.0) #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> distributional 0.3.2 2023-03-22 [1] CRAN (R 4.3.0) #> dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) #> evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) #> fable * 0.3.2 2022-09-01 [1] CRAN (R 4.3.2) #> fabletools * 0.3.3 2023-04-04 [1] CRAN (R 4.3.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) #> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> feasts 0.3.1 2023-03-22 [1] CRAN (R 4.3.0) #> forecast 8.21.1 2023-08-31 [1] CRAN (R 4.3.0) #> fracdiff 1.5-2 2022-10-31 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 3.4.3 2023-08-14 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0) #> htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.0) #> knitr 1.44 2023-09-11 [1] CRAN (R 4.3.0) #> lattice 0.21-9 2023-10-01 [2] CRAN (R 4.3.2) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> lmtest 0.9-40 2022-03-21 [1] CRAN (R 4.3.0) #> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> nlme 3.1-163 2023-08-09 [2] CRAN (R 4.3.2) #> nnet 7.3-19 2023-05-03 [2] CRAN (R 4.3.2) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> progressr 0.14.0 2023-08-10 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> quadprog 1.5-8 2019-11-20 [1] CRAN (R 4.3.0) #> quantmod 0.4.25 2023-08-22 [1] CRAN (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.1) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0) #> timeDate 4022.108 2023-01-07 [1] CRAN (R 4.3.0) #> tseries 0.10-54 2023-05-02 [1] CRAN (R 4.3.0) #> tsibble * 1.1.3 2022-10-09 [1] CRAN (R 4.3.0) #> TTR 0.24.3 2021-12-12 [1] CRAN (R 4.3.0) #> urca 1.3-3 2022-08-29 [1] CRAN (R 4.3.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) #> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0) #> withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) #> xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0) #> xts 0.13.1 2023-04-16 [1] CRAN (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.0) #> #> [1] /Users/kwilliams/R/library/4.3 #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Any help or insight would be greatly appreciated!

mitchelloharawild commented 5 months ago

The error is occurring due to the unspecified intercept triggering model selection (and hence the order_constraint). When the model isn't fully specified (including intercepts) ARIMA will attempt to automatically select the best model subject to the constraints (selected pdq, and other options like order_constraint).

To fully match the model from forecast::auto.arima() you should specify the presence of the intercept/constant with y ~ 1 or remove the intercept/constant with y ~ 0.

kenahoo commented 5 months ago

Thanks for your quick reply, Mitchell - I'll give this a shot tomorrow!

kenahoo commented 5 months ago

I got a chance to try it out - looks like I also needed to fill in the PDQ parameters, so my final code looks like the following. Thanks for pointing me in the right direction!

suppressPackageStartupMessages({
  library(dplyr)
  library(tsibble)
  library(fable)
})
# We also use 'forecast' and 'fabletools' below

train_model <- function(X, y) {
  #use auto.arima to get model specification rather than using fable. auto.arima is generally much faster
  arima_fit <- forecast::auto.arima(
    y=ts(y %>% as.ts(), frequency = 365.25),
    xreg=X %>% as.data.frame() %>% select(-date) %>% as.matrix()
  )
  cat("##### Auto ARIMA fit:", "\n")
  print(arima_fit)

  arima_order <- arima_fit$arma[c(1, 6, 2, 3, 7, 4, 5)]

  pdq_form <- sprintf('pdq(%s,%s,%s)', arima_order[1], arima_order[2], arima_order[3])
  PDQ_form <- sprintf('PDQ(%s,%s,%s,period=%s)', arima_order[4], arima_order[5], arima_order[6], period=arima_order[7])

  arima_form <- formula(paste('target ~ x1 +', pdq_form, '+', PDQ_form, '+ 1'))

  cat("\narima_formula: ", format(arima_form), "\n")

  training <- inner_join(X, y, by="date") %>%
    as_tsibble(index=date)

  fabletools::model(training, ARIMA(arima_form))
}

# Works:
training <- readRDS('~/Downloads/training-noerror.rds')
fit <- train_model(X = training %>% select(x1), y = training %>% select(target))
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> ##### Auto ARIMA fit: 
#> Series: ts(y %>% as.ts(), frequency = 365.25) 
#> Regression with ARIMA(2,1,2) errors 
#> 
#> Coefficients:
#>          ar1      ar2      ma1     ma2      x1
#>       0.7686  -0.1705  -1.2373  0.3404  0.0031
#> s.e.  0.6741   0.2510   0.6745  0.5690  0.0004
#> 
#> sigma^2 = 0.2235:  log likelihood = -488.13
#> AIC=988.26   AICc=988.37   BIC=1015.83
#> 
#> arima_formula:  target ~ x1 + pdq(2, 1, 2) + PDQ(0, 0, 0, period = 365) + 1

# Now works too, with PDQ parameters included:
training2 <- readRDS('~/Downloads/training-error.rds')
fit <- train_model(X = training2 %>% select(x1), y = training2 %>% select(target))
#> ##### Auto ARIMA fit: 
#> Series: ts(y %>% as.ts(), frequency = 365.25) 
#> Regression with ARIMA(5,1,3) errors 
#> 
#> Coefficients:
#>           ar1      ar2     ar3     ar4     ar5      ma1     ma2      ma3     x1
#>       -0.3714  -0.4704  0.3603  0.0888  0.0572  -0.0890  0.0609  -0.7744  2e-03
#> s.e.   0.0913   0.0891  0.0676  0.0487  0.0464   0.0835  0.0906   0.0733  5e-04
#> 
#> sigma^2 = 0.3704:  log likelihood = -670.28
#> AIC=1360.55   AICc=1360.86   BIC=1406.49
#> 
#> arima_formula:  target ~ x1 + pdq(5, 1, 3) + PDQ(0, 0, 0, period = 365) + 1

Created on 2024-01-30 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.3 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz UTC #> date 2024-01-30 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> anytime 0.3.9 2020-08-27 [1] CRAN (R 4.3.0) #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> distributional 0.3.2 2023-03-22 [1] CRAN (R 4.3.0) #> dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) #> evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) #> fable * 0.3.2 2022-09-01 [1] CRAN (R 4.3.2) #> fabletools * 0.3.3 2023-04-04 [1] CRAN (R 4.3.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) #> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> forecast 8.21.1 2023-08-31 [1] CRAN (R 4.3.0) #> fracdiff 1.5-2 2022-10-31 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 3.4.3 2023-08-14 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0) #> htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.0) #> knitr 1.44 2023-09-11 [1] CRAN (R 4.3.0) #> lattice 0.21-9 2023-10-01 [2] CRAN (R 4.3.2) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> lmtest 0.9-40 2022-03-21 [1] CRAN (R 4.3.0) #> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> nlme 3.1-163 2023-08-09 [2] CRAN (R 4.3.2) #> nnet 7.3-19 2023-05-03 [2] CRAN (R 4.3.2) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> progressr 0.14.0 2023-08-10 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> quadprog 1.5-8 2019-11-20 [1] CRAN (R 4.3.0) #> quantmod 0.4.25 2023-08-22 [1] CRAN (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.1) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0) #> timeDate 4022.108 2023-01-07 [1] CRAN (R 4.3.0) #> tseries 0.10-54 2023-05-02 [1] CRAN (R 4.3.0) #> tsibble * 1.1.3 2022-10-09 [1] CRAN (R 4.3.0) #> TTR 0.24.3 2021-12-12 [1] CRAN (R 4.3.0) #> urca 1.3-3 2022-08-29 [1] CRAN (R 4.3.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) #> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0) #> withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) #> xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0) #> xts 0.13.1 2023-04-16 [1] CRAN (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.0) #> #> [1] /Users/kwilliams/R/library/4.3 #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
mitchelloharawild commented 5 months ago

Great, glad it worked out for you!