tidyverts / fable

Tidy time series forecasting
https://fable.tidyverts.org
GNU General Public License v3.0
558 stars 64 forks source link

Re-consider IC output for refitted models #269

Open TimothyHyndman opened 4 years ago

TimothyHyndman commented 4 years ago

Using the latest github version of fable here (but issue is also present on CRAN version).

library(tsibble)
library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

lung_deaths <- as_tsibble(mdeaths)

# Fit ARIMA on first part of timeseries.
ts1 <- lung_deaths %>% filter(index <= as.Date("1977-01-01"))
#> Warning in mask$eval_all_filter(dots, env_filter): Incompatible methods
#> ("<=.vctrs_vctr", "<=.Date") for "<="
fit <- ts1 %>%
  model(ARIMA(value ~ 1 + pdq(1,0,0) + PDQ(0,0,0)))

# Refit ARIMA with one more observation
ts2 <- lung_deaths %>% filter(index > as.Date("1977-01-01"))
#> Warning in mask$eval_all_filter(dots, env_filter): Incompatible methods
#> (">.vctrs_vctr", ">.Date") for ">"
fit %>%
  refit(ts2 %>% head(1)) %>%
  report()
#> Series: value 
#> Model: ARIMA(1,0,0) w/ mean 
#> 
#> Coefficients:
#>          ar1  constant
#>       0.7851  364.8801
#> s.e.  0.1030   43.9553
#> 
#> sigma^2 estimated as 1579:  log likelihood=-5.58
#> AIC=13.16   AICc=9.16   BIC=11.16

# Refit ARIMA with two more observations
fit %>%
  refit(ts2 %>% head(2)) %>%
  report()
#> Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
#> You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
#> Error: Problem with `mutate()` input `ARIMA(value ~ 1 + pdq(1, 0, 0) + PDQ(0, 0, 0))`.
#> x Could not find an appropriate ARIMA model.
#> This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
#> For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
#> ℹ Input `ARIMA(value ~ 1 + pdq(1, 0, 0) + PDQ(0, 0, 0))` is `(function (object, ...) ...`.

Created on 2020-05-18 by the reprex package (v0.3.0)

mitchelloharawild commented 4 years ago

@robjhyndman This is coming from here (which comes from forecast::auto.arima()) https://github.com/tidyverts/fable/blob/64b36c6df3285e33c912991a1cc885604bbb3158/R/arima.R#L153

Any reason why npar is the number of estimated coefficients +1?

mitchelloharawild commented 4 years ago

Also @TimothyHyndman, when you say "Refit ARIMA with two more observations", you're refitting the model with only two observations (not two more observations). You'll need to refit the model with the historical data and the next two observations (or use stream.ARIMA once #251 is done).

robjhyndman commented 4 years ago

npar = # coefficients + 1 to account for the residual variance. This is how the AIC/AICc/BIC defines # parameters in a model.

mitchelloharawild commented 4 years ago

Got it, thanks. https://github.com/tidyverts/fable/blob/64b36c6df3285e33c912991a1cc885604bbb3158/R/arima.R#L159

Then in this case (n=2, d=0, D=0), nstar = 2 giving division by (nstar - npar - 1) = 0. Is this correct?

robjhyndman commented 4 years ago

It is problematic computing AICc on a refit because the parameters were not estimated on that data set. If the data was used for estimation, then the equation is correct. But you would expect nstar to always be bigger than npar+1 or you would be over-fitting.

I'm not sure what we should return as AICc value on a refit -- possibly the original AICc on the original data, or perhaps NA or NULL.

TimothyHyndman commented 4 years ago

Also @TimothyHyndman, when you say "Refit ARIMA with two more observations", you're refitting the model with only two observations (not two more observations). You'll need to refit the model with the historical data and the next two observations (or use stream.ARIMA once #251 is done).

Ah, I was thinking that refit used the data contained in fit in addition to the data passed in with new_data =. Thanks for the correction.