Open hadley opened 4 years ago
Need to consider the sticky column case, like panelr.
Ideally we'd be like dplyr, and just forcibly make the assumption that [
with 1 argument i
is going to return a data frame with length length(i)
.
I have a feeling that we are going to have to say: if you have sticky columns and a sticky [
method, you'll need to implement an S3 method for this generic specific to your package. Otherwise it should just work.
That would break packages like this (with sticky cols) until they add a method for these operations. But it isn't like it worked right to begin with.
library(tidyr)
library(panelr)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
wages
#> # Panel data: 4,165 × 14
#> # entities: id [595]
#> # wave variable: t [1, 2, 3, ... (7 waves)]
#> id t exp wks occ ind south smsa ms fem union ed blk
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 3 32 0 0 1 0 1 0 0 9 0
#> 2 1 2 4 43 0 0 1 0 1 0 0 9 0
#> 3 1 3 5 40 0 0 1 0 1 0 0 9 0
#> 4 1 4 6 39 0 0 1 0 1 0 0 9 0
#> 5 1 5 7 42 0 1 1 0 1 0 0 9 0
#> 6 1 6 8 35 0 1 1 0 1 0 0 9 0
#> 7 1 7 9 32 0 1 1 0 1 0 0 9 0
#> 8 2 1 30 34 1 0 0 0 1 0 0 11 0
#> 9 2 2 31 27 1 0 0 0 1 0 0 11 0
#> 10 2 3 32 33 1 1 0 0 1 0 1 11 0
#> # … with 4,155 more rows, and 1 more variable: lwage <dbl>
# Sticky cols
wages <- wages["exp"]
wages
#> # Panel data: 4,165 × 3
#> # entities: id [595]
#> # wave variable: t [1, 2, 3, ... (7 waves)]
#> id t exp
#> <fct> <dbl> <dbl>
#> 1 1 1 3
#> 2 1 2 4
#> 3 1 3 5
#> 4 1 4 6
#> 5 1 5 7
#> 6 1 6 8
#> 7 1 7 9
#> 8 2 1 30
#> 9 2 2 31
#> 10 2 3 32
#> # … with 4,155 more rows
# Meaning they come along for the ride here
chop(wages, exp)
#> New names:
#> * id -> id...1
#> * t -> t...2
#> * id -> id...3
#> * t -> t...4
#> # A tibble: 4,165 × 5
#> id...1 t...2 id...3 t...4 exp
#> <fct> <dbl> <list<fct>> <list<dbl>> <list<dbl>>
#> 1 1 1 [1] [1] [1]
#> 2 1 2 [1] [1] [1]
#> 3 1 3 [1] [1] [1]
#> 4 1 4 [1] [1] [1]
#> 5 1 5 [1] [1] [1]
#> 6 1 6 [1] [1] [1]
#> 7 1 7 [1] [1] [1]
#> 8 2 1 [1] [1] [1]
#> 9 2 2 [1] [1] [1]
#> 10 2 3 [1] [1] [1]
#> # … with 4,155 more rows
# Genericity doesn't realllly work right
# In theory this should be a panel data frame, but reconstruct_tibble()
# took over since it inherits from grouped_df
tidyr::pack(wages, data = exp)
#> # A tibble: 4,165 × 3
#> # Groups: id [595]
#> id t data$id $t $exp
#> <fct> <dbl> <fct> <dbl> <dbl>
#> 1 1 1 1 1 3
#> 2 1 2 1 2 4
#> 3 1 3 1 3 5
#> 4 1 4 1 4 6
#> 5 1 5 1 5 7
#> 6 1 6 1 6 8
#> 7 1 7 1 7 9
#> 8 2 1 2 1 30
#> 9 2 2 2 2 31
#> 10 2 3 2 3 32
#> # … with 4,155 more rows
Created on 2021-11-12 by the reprex package (v2.0.1)
Let's kick this down the road again.
See https://github.com/tidyverse/tidyr/issues/1556 for an example. reconstruct_tibble()
drops the class through as_tibble()
, which is currently the expected behavior.
iris |>
dplyr::as_tibble() |>
structure(class = c("pop_data", "tbl_df", "tbl", "data.frame")) |>
tidyr::drop_na() |>
class()
#> [1] "tbl_df" "tbl" "data.frame"
See existing work in #812, and see below for a list of functions that we needed to consider, and some thoughts on what form of genericity is needed. Goal is to make sure that data frame extensions return reasonable results in the absence of specific methods (and to make sure all needed functions are generic so that they can be extended when needed).