tidyverts / fable

Tidy time series forecasting
https://fable.tidyverts.org
GNU General Public License v3.0
564 stars 66 forks source link

adding exogenous regressors only to one level in hierarchy #350

Open slava-keshkov opened 3 years ago

slava-keshkov commented 3 years ago

Hello,

I was adding exogenous regressors to the forecasting hierarchy using ARIMA model and formula notation. It works well when the exogenous values are added to all levels in a hierarchy.

However, I would like to try applying them only to one level in hierarchy.

I tried replacoing values of exogenous values on the levels I do NOT need with NA. This solution causes all models to become "NULL model" after fitting model() object. During fitting, I also get a warning messages saying that exogenous variables have been dropped because the matrix was rank deficient.

I can not drop exogenous values completely, since exogenous values are columns in the data frame with the forecasting groups.

Would love to hear your feedback on that. Seems like an essential functionality to me

mitchelloharawild commented 3 years ago

This should be possible, but it's not something I've tested. Without your code, I can't see what you've tried and where this error might be coming from. Please provide a minimal reproducible example: https://www.tidyverse.org/help/

My best guess is that you've tried using a model specification with the same regressors for the top levels and other levels of the hierarchy. To 'remove' the regressors where you don't want them, you've provided them in the data as NA. Instead of this, you should specify a model (with the formula) that does not need exogenous regressors. Currently the best way to do this is to split up your tsibble, produce several mables, and then combine the mable into a complete hierarchy.

Here's a complete example for what I think you want to do. Note that you'll need to install the dev version of fabletools with remotes::install_github("tidyverts/fabletools") as I found a bug with bind_rows(<mable>, <mable>) in the process:

library(fable)
#> Loading required package: fabletools
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union

# Prepare the data
lung_deaths <- as_tsibble(cbind(mdeaths, fdeaths)) %>% 
  aggregate_key(key, value = sum(value))

# Split up the data by aggregation
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
agg_ld <- lung_deaths %>% filter(is_aggregated(key))
btm_ld <- lung_deaths %>% filter(!is_aggregated(key))

# Specify and train models
fit_agg_ld <- agg_ld %>% 
  # For some regressor x
  mutate(x = seq_along(value)) %>% 
  # Estimate a dynamic regression model
  model(mdl = ARIMA(value ~ x))
fit_btm_ld <- btm_ld %>% 
  # All other models are ETS
  model(mdl = ETS(value))

# Combine models into single mable of complete hierarchy
fit <- bind_rows(fit_agg_ld, fit_btm_ld)
fit
#> # A mable: 3 x 2
#> # Key:     key [3]
#>   key                                             mdl
#>   <chr*>                                      <model>
#> 1 <aggregated> <LM w/ ARIMA(0,0,1)(1,0,0)[12] errors>
#> 2 fdeaths                                <ETS(M,N,M)>
#> 3 mdeaths                                <ETS(M,A,A)>

fit <- fit %>% 
  # Add MinT reconciliation
  reconcile(mdl = min_trace(mdl))

# Produce forecasts
## Need to specify future values of the regressor.
## This can be NA for models that don't use the regressor.
lung_deaths_future <- new_data(lung_deaths, 24) %>% 
  mutate(x = rep(73:96, 3))

## Forecast (with reconciliation) the lung deaths using the trained models
forecast(fit, new_data = lung_deaths_future)
#> # A fable: 72 x 6 [1M]
#> # Key:     key, .model [3]
#>    key          .model    index          value     x .mean
#>    <chr*>       <chr>     <mth>         <dist> <int> <dbl>
#>  1 <aggregated> mdl    1980 Jan N(2664, 31042)    73 2664.
#>  2 <aggregated> mdl    1980 Feb N(2666, 32837)    74 2666.
#>  3 <aggregated> mdl    1980 Mar N(2497, 29099)    75 2497.
#>  4 <aggregated> mdl    1980 Apr N(2030, 20884)    76 2030.
#>  5 <aggregated> mdl    1980 May N(1618, 15179)    77 1618.
#>  6 <aggregated> mdl    1980 Jun N(1461, 13536)    78 1461.
#>  7 <aggregated> mdl    1980 Jul N(1384, 12647)    79 1384.
#>  8 <aggregated> mdl    1980 Aug N(1252, 11325)    80 1252.
#>  9 <aggregated> mdl    1980 Sep N(1246, 11262)    81 1246.
#> 10 <aggregated> mdl    1980 Oct N(1512, 14221)    82 1512.
#> # … with 62 more rows

Created on 2021-10-09 by the reprex package (v2.0.0)

slava-keshkov commented 2 years ago

Hi @mitchelloharawild thanks for your solution, its working

With regards to this topic I am also wondering - is it possible to add an exogenous variable for one of the many hierarchical series after the full mable has been already trained and reconciled?

For example: We fit 100 ARIMA models on multiple different levels. We train them and do the reconciliation. We save the trained mable to .RDS file.

Then we want to re-train only one of the series with an added exogenous parameter. Can we then "pull out" one model, re-train it and apply reconciliation without the need for training the rest of the models? Or is it virtually impossible?

Let me know if the question sounds clear enough or I should provide a better example

Looking forward to your replies! 🙌