Reconciled Arima with exogenous regressors produces unexplainable results

Running into a strange occurrence where a reconciled arima model with exogenous regressors produces an unexplained set of results, where all of the reconciled estimates are far above the baseline models (in the absence of the aggregate_key / reconciliation) by a factor of 20% or more.

In the below example, I would expect that the no hierarchy/aggregation models would line up with the arima_base as well as the top_down lining up with the non-aggregated (aggregate). However, they seem to produce much different results. Below I am using a dummy regressor that's intended to be the same value across all series.

I apologize in advance for the verbosity, but currently I am unable to pinpoint exactly where the differences come from.

The only related issue that I could find is https://github.com/tidyverts/fable/issues/373 though it does seem to be slightly different in nature.

library(fpp3)
library(purrr)
library(highcharter)

# base data set

base_data_set <- fpp3::us_employment %>%
  filter(Title %in% c("Private Service-Providing", "Government", "Goods-Producing")) %>%
  index_by(Month) %>%
  mutate(total_employed_jitter = sum(Employed, na.rm = TRUE) %>% jitter(amount = 5000)) %>%
  ungroup()

# no hierarchy / aggregation ---------------------------------------------------------------

# fitted without hierarchy / aggregate_key
fitted_no_hierarchy <- base_data_set %>%
  model(
    arima_no_agg = ARIMA(Employed ~ total_employed_jitter)
  ) %>%
  fitted()

# fitted total without hierarchy / aggregate key
fitted_sum_no_hierarchy <- base_data_set %>%
  index_by(Month) %>%
  summarize(
    total_employed = sum(Employed, na.rm = TRUE),
    total_employed_jitter = mean(total_employed_jitter, na.rm = TRUE)
  ) %>%
  model(
    arima_no_agg = ARIMA(total_employed ~ total_employed_jitter)
  ) %>%
  fitted() %>%
  mutate(Title = "<aggregated>")

joined_no_hierarchy <- fitted_no_hierarchy %>%
  left_join(
    distinct(us_employment,Series_ID,Title)
  ) %>%
  full_join(
    fitted_sum_no_hierarchy
  ) %>%
  as_tibble() %>%
  select(-Series_ID) %>%
  rename(.mean = .fitted)

# with hierarchy / aggregated key -----------------------------------------

# aggregated key
agg_data_set <- base_data_set %>%
  aggregate_key(Title,
                Employed = sum(Employed, na.rm = TRUE),
                total_employed_jitter = mean(total_employed_jitter, na.rm = TRUE)
  )

# fitted reconciled
fitted_recon_arima <- agg_data_set %>%
  model(
    arima_base = ARIMA(Employed ~ total_employed_jitter)
  ) %>%
  reconcile(
    top_down = top_down(arima_base)
  ) %>%
  forecast(agg_data_set)

# joined & plotted  -----------------------------------------------------------------

fitted_recon_arima %>%
  rename(dist_emp = Employed) %>%
  left_join(
    agg_data_set
  ) %>%
  full_join(
    joined_no_hierarchy
  ) %>%
  as_tibble() %>%
  mutate(
    month = as_date(Month),
    Title = as.character(Title)
  ) %>%
  select(Title, .model, month, .mean, actual = Employed) %>%
  mutate(Title = stringr::str_replace(Title,"<aggregated>","aggregated")) %>%
  split(.$Title) %>%
  imap(~ {
    hchart(.x, "line", hcaes(month, .mean, group = .model)) %>%
      hc_add_series(.x %>% distinct(month, actual) %>% mutate(.model = "actual"), "scatter", hcaes(month, actual, group = .model),
                    color = I("#D22A2F1A")) %>%
      hc_title(text = .y)
  }) %>%
  hw_grid()

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] highcharter_0.9.4.9000 purrr_0.3.4            fable_0.3.1           
#>  [4] feasts_0.2.2           fabletools_0.3.2       tsibbledata_0.4.0     
#>  [7] tsibble_1.1.1          ggplot2_3.3.6          lubridate_1.8.0       
#> [10] tidyr_1.2.0            dplyr_1.0.9            tibble_3.1.7          
#> [13] fpp3_0.5              
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3         lattice_0.20-41      zoo_1.8-10          
#>  [4] assertthat_0.2.1     digest_0.6.29        utf8_1.2.2          
#>  [7] R6_2.5.1             backports_1.4.1      reprex_2.0.1        
#> [10] evaluate_0.15        highr_0.9            pillar_1.7.0        
#> [13] rlang_1.0.2          curl_4.3.2           data.table_1.14.2   
#> [16] rstudioapi_0.13      TTR_0.24.3           R.utils_2.11.0      
#> [19] R.oo_1.25.0          rmarkdown_2.14       styler_1.7.0        
#> [22] stringr_1.4.0        htmlwidgets_1.5.4    igraph_1.3.0        
#> [25] munsell_0.5.0        broom_0.8.0          anytime_0.3.9       
#> [28] compiler_4.0.3       xfun_0.31            pkgconfig_2.0.3     
#> [31] htmltools_0.5.2      tidyselect_1.1.2     fansi_1.0.3         
#> [34] crayon_1.5.1         withr_2.5.0          R.methodsS3_1.8.2   
#> [37] rappdirs_0.3.3       grid_4.0.3           distributional_0.3.0
#> [40] jsonlite_1.8.0       gtable_0.3.0         lifecycle_1.0.1     
#> [43] DBI_1.1.3            magrittr_2.0.3       scales_1.2.0        
#> [46] rlist_0.4.6.2        quantmod_0.4.20      cli_3.3.0           
#> [49] stringi_1.7.6        farver_2.1.0         fs_1.5.2            
#> [52] xts_0.12.1           ellipsis_0.3.2       generics_0.1.2      
#> [55] vctrs_0.4.1          tools_4.0.3          R.cache_0.15.0      
#> [58] glue_1.6.2           fastmap_1.1.0        yaml_2.3.5          
#> [61] colorspace_2.0-3     knitr_1.39

tidyverts / fable

Reconciled Arima with exogenous regressors produces unexplainable results #395