tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

forecast error on exogenous categorical variables #356

Closed robjhyndman closed 4 months ago

robjhyndman commented 2 years ago

MRE

library(fpp3)
#> ── Attaching packages ─────────────────────────────────────── fpp3 0.4.0.9000 ──
#> ✔ tibble      3.1.7     ✔ tsibble     1.1.1
#> ✔ dplyr       1.0.9     ✔ tsibbledata 0.4.0
#> ✔ tidyr       1.2.0     ✔ feasts      0.2.2
#> ✔ lubridate   1.8.0     ✔ fable       0.3.1
#> ✔ ggplot2     3.3.6     ✔ fabletools  0.3.2
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date()    masks base::date()
#> ✖ dplyr::filter()      masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval()  masks lubridate::interval()
#> ✖ dplyr::lag()         masks stats::lag()
#> ✖ tsibble::setdiff()   masks base::setdiff()
#> ✖ tsibble::union()     masks base::union()
elec <- vic_elec %>%
  mutate(
    Day_Type = case_when(
      Holiday ~ "Holiday",
      wday(Date) %in% 2:6 ~ "Weekday",
      TRUE ~ "Weekend"
  )) 
fit <- elec %>%
  model(shf = TSLM(log(Demand) ~ Day_Type))
fit %>% report()
#> Series: Demand 
#> Model: TSLM 
#> Transformation: log(Demand) 
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.47374 -0.11822  0.01978  0.10979  0.66244 
#> 
#> Coefficients:
#>                 Estimate Std. Error  t value Pr(>|t|)    
#> (Intercept)     8.293077   0.004406 1882.134  < 2e-16 ***
#> Day_TypeWeekday 0.187077   0.004496   41.610  < 2e-16 ***
#> Day_TypeWeekend 0.032076   0.004620    6.943 3.88e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.17 on 52605 degrees of freedom
#> Multiple R-squared: 0.1572,  Adjusted R-squared: 0.1571
#> F-statistic:  4905 on 2 and 52605 DF, p-value: < 2.22e-16
newdata <- tail(elec, 48)
fit %>%
  forecast(new_data = newdata)
#> Error in `mutate()`:
#> ! Problem while computing `shf = (function (object, ...) ...`.
#> Caused by error:
#> ! contrasts can be applied only to factors with 2 or more levels
#>   Unable to compute required variables from provided `new_data`.
#>   Does your model require extra variables to produce forecasts?

Created on 2022-05-23 by the reprex package (v2.0.1)

mitchelloharawild commented 4 months ago

Best practice is to use factors here, so that all possible values of Day_Type are known in both the modelling and forecasting stages - even if they are not observed in that time window.

That said, this behaviour is likely to cause issues (but is hard to fix), so I've opened another issue for it here: https://github.com/tidyverts/fabletools/issues/398

library(fpp3)
#> -- Attaching packages ---------------------------------------------- fpp3 0.5 --
#> v tibble      3.2.1          v tsibble     1.1.4     
#> v dplyr       1.1.3          v tsibbledata 0.4.1     
#> v tidyr       1.3.0          v feasts      0.3.1.9000
#> v lubridate   1.9.3          v fable       0.3.3.9000
#> v ggplot2     3.5.0          v fabletools  0.4.0
#> -- Conflicts ------------------------------------------------- fpp3_conflicts --
#> x lubridate::date()    masks base::date()
#> x dplyr::filter()      masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval()  masks lubridate::interval()
#> x dplyr::lag()         masks stats::lag()
#> x tsibble::setdiff()   masks base::setdiff()
#> x tsibble::union()     masks base::union()
elec <- tsibbledata::vic_elec %>%
  mutate(
    Day_Type = factor(case_when(
      Holiday ~ "Holiday",
      wday(Date) %in% 2:6 ~ "Weekday",
      TRUE ~ "Weekend"
    )) )
fit <- elec %>%
  model(shf = fable::TSLM(log(Demand) ~ Day_Type))
fit %>% report()
#> Series: Demand 
#> Model: TSLM 
#> Transformation: log(Demand) 
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.47374 -0.11822  0.01978  0.10979  0.66244 
#> 
#> Coefficients:
#>                 Estimate Std. Error  t value Pr(>|t|)    
#> (Intercept)     8.293077   0.004406 1882.134  < 2e-16 ***
#> Day_TypeWeekday 0.187077   0.004496   41.610  < 2e-16 ***
#> Day_TypeWeekend 0.032076   0.004620    6.943 3.88e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.17 on 52605 degrees of freedom
#> Multiple R-squared: 0.1572,  Adjusted R-squared: 0.1571
#> F-statistic:  4905 on 2 and 52605 DF, p-value: < 2.22e-16

newdata <- tail(elec, 48)
fit %>%
  forecast(new_data = newdata)
#> # A fable: 48 x 8 [30m] <Australia/Melbourne>
#> # Key:     .model [1]
#>    .model Time                          Demand .mean Temperature Date      
#>    <chr>  <dttm>                        <dist> <dbl>       <dbl> <date>    
#>  1 shf    2014-12-31 00:00:00 t(N(8.5, 0.029)) 4888.        16.2 2014-12-31
#>  2 shf    2014-12-31 00:30:00 t(N(8.5, 0.029)) 4888.        16   2014-12-31
#>  3 shf    2014-12-31 01:00:00 t(N(8.5, 0.029)) 4888.        15.5 2014-12-31
#>  4 shf    2014-12-31 01:30:00 t(N(8.5, 0.029)) 4888.        15   2014-12-31
#>  5 shf    2014-12-31 02:00:00 t(N(8.5, 0.029)) 4888.        14.4 2014-12-31
#>  6 shf    2014-12-31 02:30:00 t(N(8.5, 0.029)) 4888.        14.3 2014-12-31
#>  7 shf    2014-12-31 03:00:00 t(N(8.5, 0.029)) 4888.        14   2014-12-31
#>  8 shf    2014-12-31 03:30:00 t(N(8.5, 0.029)) 4888.        13.8 2014-12-31
#>  9 shf    2014-12-31 04:00:00 t(N(8.5, 0.029)) 4888.        13.6 2014-12-31
#> 10 shf    2014-12-31 04:30:00 t(N(8.5, 0.029)) 4888.        13.3 2014-12-31
#> # i 38 more rows
#> # i 2 more variables: Holiday <lgl>, Day_Type <fct>

Created on 2024-03-02 with reprex v2.0.2