tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Automatically select() key variables in a mable #170

Closed hongooi73 closed 4 years ago

hongooi73 commented 4 years ago

This is somewhat related to #167. Rather than deleting a model, is there a good way to select only a specified model in a mable? Doing select(mable, mod) results in

Error: The result is not a valid mable. The key variables must uniquely identify each row. Run rlang::last_error() to see where the error occurred.

mitchelloharawild commented 4 years ago

select(mable, key, mod) is required here.

I'm not sure if key variables should be automatically included in the select() call or not, but currently the dplyr support is rather simple (most functions just call NextMethod() and then repair the structure with checks).

I think it is best for select.mdl_df() to match select.tbl_ts() and select.fbl_ts() in terms of handling of keys.

hongooi73 commented 4 years ago

select for tsibbles is actually inconsistent in how it treats key and indices. It automatically selects the latter, but not the former. I would suggest that it should always select the keys, if present.

mitchelloharawild commented 4 years ago

select() for tsibble is consistent as keys and indices have different nature. All tsibbles must have an index, but they may not have keys.

If a key is no longer necessary to uniquely identify a time series, it can be removed with select().

library(tsibble)
library(dplyr)
tourism %>% 
  filter(Purpose == "Holiday") %>% 
  select(-Purpose)
#> # A tsibble: 6,080 x 4 [1Q]
#> # Key:       Region, State [76]
#>    Quarter Region   State           Trips
#>      <qtr> <chr>    <chr>           <dbl>
#>  1 1998 Q1 Adelaide South Australia  224.
#>  2 1998 Q2 Adelaide South Australia  130.
#>  3 1998 Q3 Adelaide South Australia  156.
#>  4 1998 Q4 Adelaide South Australia  182.
#>  5 1999 Q1 Adelaide South Australia  185.
#>  6 1999 Q2 Adelaide South Australia  135.
#>  7 1999 Q3 Adelaide South Australia  136.
#>  8 1999 Q4 Adelaide South Australia  169.
#>  9 2000 Q1 Adelaide South Australia  184.
#> 10 2000 Q2 Adelaide South Australia  134.
#> # … with 6,070 more rows

Created on 2020-03-07 by the reprex package (v0.3.0)

Fundamentally, the same could be said for mables, however I expect redundant keys would be less common.

hongooi73 commented 4 years ago

Yeah, what I mean is, if a tsibble has a key, then selecting should retain it by default. This just seems more intuitive and easier to use.

I can understand selecting both key and index, and not selecting both key and index. But selecting index and not selecting key seems weird.

mitchelloharawild commented 4 years ago

@earowang can comment on this.

earowang commented 4 years ago

select() a tsibble now by default includes key and index in the dev version. should be there for next CRAN release

mitchelloharawild commented 4 years ago

Consolidated into #192