tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Error in hierarchical forecast for unbalanced panel: "Data length is not a sub-multiple or multiple of the number of rows" #199

Closed davidnield closed 4 years ago

davidnield commented 4 years ago

Following up from this Twitter thread: https://twitter.com/robjhyndman/status/1261171346446270464?s=20

Working on a batch hierarchical forecast with ~1000 individuals and 3 categories (at least 100 individuals in each) with vastly different amounts of historical data for each individual (daily observations, ranging from 7 days of data to a few years of data), and running into the error "Warning message: In matrix(invoke(c, res), ncol = length(object)): data length is not a sub-multiple or multiple of the number of rows"

The code below reproduces the error (although it does actually finish save the forecast object, in my actual case it does not, the error is the same however). Apologies that it isn't as clean and minimal as it could be, this is my first time filing an issue/writing a minimally reproducible example. I hope it is readable!

# Loading packages
library(dplyr)
library(tidyr)
library(lubridate)
library(fable)

set.seed(1)

# Three categories
categories <- c("A", "B", "C")

# Three different timespans
three_years <- as_date(ymd("2017-01-01"):ymd("2020-01-01"))
one_year <- as_date(ymd("2019-01-01"):ymd("2020-01-01"))
one_week <- as_date(ymd("2019-12-24"):ymd("2020-01-01"))

# Crossing the categories and timespans for 9 individuals
df <- tibble(
  id = as.character(1:3),
  category = categories
) %>% 
  crossing(date = three_years) %>% 
  bind_rows(
    tibble(
      id = as.character(4:6),
      category = categories
    ) %>% 
      crossing(date = one_year)
  ) %>% 
  bind_rows(
    tibble(
      id = as.character(7:9),
      category = categories
    ) %>% 
      crossing(date = one_week)
  )

# Randomly sampling values from a Poisson, creating the tsibble and aggregate-key structure then fitting and reconciling an ETS model
fit <- df %>% 
  mutate(value = rpois(n = 4413, lambda = 1)) %>% 
  as_tsibble(key = c(id, category), index = date) %>% 
  aggregate_key((category / id), value = sum(value)) %>% 
  model(ets = ETS(value)) %>% 
  reconcile(ets_adjust = min_trace(ets))

# Forecasting 14 steps forward
fc <- fit %>% 
  forecast(h = 14)
# Warning message:
# In matrix(invoke(c, res), ncol = length(object)) :
#  data length [8797] is not a sub-multiple or multiple of the number of rows [677]

Running R version 4.0.0 dplyr 0.8.99.9003 tidyr 1.0.3 lubridate 1.7.8 fable 0.2.0.9000 fabletools 0.1.3.9000 tsibble 0.8.9.9000 vctrs 0.3.0.9000 tibble 3.0.1.9000

mitchelloharawild commented 4 years ago

My guess is this is a breaking change introduced by dplyr dev. Seems like the list class used for reconciliation is dropped when calling mutate_at().

Further investigation suggests that this is an issue with the inconsistent lengths of series. I think this will be an computationally expensive thing to fix, as the residuals (and I suppose forecasts) will need to be joined together based on the index.

mitchelloharawild commented 4 years ago

I've made a rough patch for this in 5e30079bd0d7778cb36d579fd40e19ca7f3bfe34. The method for obtaining residuals() is a bit slow, so I'd like to re-factor this at a later date. Likely this needs simplification of the <mdl_ts> (model) methods by bringing the data coercion up to the <mdl_df> (mable) method.