Open alexhallam opened 2 years ago
Almost. You need to remove the total column and set up the data as a tsibble with state as the key. Then generate the total using aggregate_key()
. There is an example at https://otexts.com/fpp3/prison.html.
The distributions are reconciled as described here: https://otexts.com/fpp3/rec-prob.html
Thanks for the quick response!
I think I see where I am getting hung up. I am starting with the output from probabilistic forecasts (downstream from where the example starts). It looks like the bottom_up()
function expects a mable
model. Do you recommend that users make their own mable
object in this case? Maybe I can make a simple model based on lm(y_sim ~ 1)
.
fit <- prison_gts %>%
filter(year(Quarter) <= 2014) %>%
model(base = ETS(Count)) %>%
reconcile(
bottom_up = bottom_up(base), # <--- needs a mable object
MinT = min_trace(base, method = "mint_shrink")
)
This is as far as I can get
df <- readr::read_csv("https://storage.googleapis.com/data_xvzf/m5_hierarchical_forecast.csv") %>%
select(-y_sim_total) %>%
pivot_longer(cols = y_sim_ca:y_sim_wi, names_to = "state", values_to = "y_sim") %>%
as_tsibble(index = ds, key = c(rep, state)) %>%
aggregate_key(total = mean(y_sim)) %>%
reconcile(
bottom_up = bottom_up(), # <-- missing a mable object
MinT = min_trace(method = "mint_shrink")
)
Currently reconcile()
only works in conjunction with the forecast()
function, but this is something I'm looking at changing.
The complexity here is that you'll need to provide your own weighting matrix, as you no longer have access to the model's training residuals. Some weighting matrices (such as OLS and structural scaling) do not use the model's residuals, and will be easier to use.
At minimum, you'll need to convert your samples into a distribution using dist_sample()
. This should be done by summarising over the replicates (grouping by ds
and state
).
From there you can achieve bottom up reconciliation by summing the distribution column in aggregate_key()
. Reconciling optimally currently would require writing your own matrix algebra, but again I'd like to improve this.
I see what you are saying. Thanks for the info!
Thanks again for the help.
I just wanted to codify what you said above. This is where things are at the moment.
df <- readr::read_csv("https://storage.googleapis.com/data_xvzf/m5_hierarchical_forecast.csv") %>%
dplyr::select(-y_sim_total) %>%
tidyr::pivot_longer(cols = y_sim_ca:y_sim_wi, names_to = "state", values_to = "y_sim") %>%
dplyr::group_by(ds, state) %>%
dplyr::summarise(dist = distributional::dist_sample(list(y_sim))) %>%
as_tsibble(index=ds, key = state) %>%
aggregate_key(total = sum(dist)) # <- error (I am not sure what to put here when my column is a dist object)
I would like to know if
reconcile()
can be used with distributional forecasts.As a reproducible example I took the m5 data and generated naive forecasts for
total
,ca
,wi
, andtx
. The total is the sum of the three states. I generated 8000 samples over a 10 day forecast horizon. The data set can be grabbed with the following.This is the head
Can I use the
reconcile()
function to reconcile these forecasts to the total?