tidyverts / tsibble

Tidy Temporal Data Frames and Tools
https://tsibble.tidyverts.org
GNU General Public License v3.0
528 stars 50 forks source link

`key_rows` does not return keys in the order they appear in the dataset #215

Closed njtierney closed 4 years ago

njtierney commented 4 years ago

I was expecting that key_rows would return the list of the data in the order that it comes in, but instead it appears that they are sorted in their index order, as provided by key_data().

Here is a minimal reprex that highlights the issue - I would expect the order of key_rows() to be the same as the example data - 2, then 1. Instead, it is 1 then 2, and key_data shows that it is sorting by id.

library(tsibble)

example <- tsibble(id = c(2,2,2,2,2,1,1),
       time = c(1,2,3,4,5,1,2),
       value = c(100:106),
       key = id,
       index = time)

key_rows(example)
#> <list_of<integer>[2]>
#> [[1]]
#> [1] 1 2
#> 
#> [[2]]
#> [1] 3 4 5 6 7
key_data(example)
#> # A tibble: 2 x 2
#>      id       .rows
#> * <dbl> <list<int>>
#> 1     1         [2]
#> 2     2         [5]

Created on 2020-08-20 by the reprex package (v0.3.0)

This created some interesting issues with brolgar when I was attempting to stratify the data into equal groups, I'll write a workaround this, but I wasn't sure if this was expected behaviour, and I couldn't find more details on the helpfile for ?key_data. It's also possible that this is not how you were expecting key_data to be used.

earowang commented 4 years ago

This is consistent with what group_data() returns for grouped data frame. They are sorted by group/key instead of index. Having meta data organised in a certain order presents some potential issues, regardless the change in original data rows.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
example <- tibble(
  id = c(2, 2, 2, 2, 2, 1, 1),
  time = c(1, 2, 3, 4, 5, 1, 2),
  value = c(100:106)
)
example %>% 
  group_by(id) %>% 
  group_data()
#> # A tibble: 2 x 2
#>      id       .rows
#> * <dbl> <list<int>>
#> 1     1         [2]
#> 2     2         [5]

Created on 2020-08-26 by the reprex package (v0.3.0)

njtierney commented 4 years ago

OK, good to know, do you think you could link to dplyr::group_data or provide a similar explanation as to what it is in the documentation for key_data()? Or would you accept a PR from me for documentation for key_data()?