tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Suggestion: utility function to wide-pivot a fable #172

Closed hongooi73 closed 4 years ago

hongooi73 commented 4 years ago

A fable is in long format: 1 row per key/time/model combination. It would be nice to have a function that pivots this so that you have 1 row per key/time, like in the data, and one column per model.

There could be one function to return a column of point forecasts per model, and another to return a column of distributions per model; or maybe just one function to do both.

If this sounds good, I could make a PR for this and #169

mitchelloharawild commented 4 years ago

Currently there is a strict requirement that a fable contains only one column of distributions. I can see how spreading across models could be useful, but for now I think it is best not to allow fable objects to do this. The long format is consistent with the output formats like tidy(), glance(), augment(), accuracy(), and this allows the column names to reflect their meaning (the response variable name). Allowing multiple distributions in a fable will also complicate methods for graphics and accuracy. So all up, this is a design decision I'd rather make at a later date. For now, you can of course convert from fable to tsibble before spreading, as tsibble does not have any requirements on the number of distribution columns.

If you'd like to write a PR for #169, that would be wonderful.

hongooi73 commented 4 years ago

Yeah, the result of this operation wouldn't be a fable anymore, but it would still be a tsibble.

mitchelloharawild commented 4 years ago

Once vctrs is supported, this shouldn't be too hard to do for the user. Something like:

as_tsibble(fbl) %>%
  pivot_wider(names_from = ".model", values_from = ".distribution")
hongooi73 commented 4 years ago

This is the code I have right now:

# returns tibble of response and predicted values
get_forecasts <- function(mable, newdata=NULL, h=NULL, ...)
{
    fcast <- forecast(mable, new_data=newdata, h=h, ...)
    keyvars <- key_vars(fcast)
    keyvars <- keyvars[-length(keyvars)]
    indexvar <- index_var(fcast)
    fcastvar <- as.character(attr(fcast, "response")[[1]])  # cf. #169
    fcast <- fcast %>%
        as_tibble() %>%
        pivot_wider(
            id_cols=all_of(c(keyvars, indexvar)),
            names_from=.model,
            values_from=all_of(fcastvar))
    select(newdata, !!keyvars, !!indexvar, !!fcastvar) %>%
        rename(.response=!!fcastvar) %>%
        inner_join(fcast)
}

Not seeing how vctrs would make this simpler?

mitchelloharawild commented 4 years ago

I think a function like this would be doing too much, it is a useful function for its specific purpose but not generalisable to common needs. Breaking the process into smaller parts makes it easier for users to re-use knowledge and work with data the way they want to.

vctrs will help with this by allowing s3 vectors like the distributions to be used with tidyverse functions like pivot_wider().

hongooi73 commented 4 years ago

Well, the biggest complication really is just pulling the response variable from the prediction dataset. Other than that, it's just a select + pivot.

mitchelloharawild commented 4 years ago

Closing as a fable must only contain a single distribution column. If you wanted to drop down to a tsibble, you can use pivot_wider() with the distribution column without any issue.