tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 420 forks source link

pack/unpack function argument for names_sep #1161

Closed gergness closed 3 years ago

gergness commented 3 years ago

I'm in a situation where neither the names_sep nor names_repair arguments to pack/unpack are quite what I want. I wish I could pass a function that would receive both the inner and outer names and return the naming strategy.

Currently, names_sep does not take a function (but does handle both the inner & outer names), and while names_repair takes a function, it does not get access to both names.

library(tidyr)

df <- tibble(
  x = 1:3, 
  y = data.frame(coef = 4:6, se = 7:9), 
  z = data.frame(coef = 10:12, se = 13:15)
)

# I wish something like this worked (also with purrr-style anonymous functions)
df %>%
  unpack(
    c(y, z), 
    names_sep = function(.x, .y) paste0(.x, ifelse(.y == "coef", "", paste0("_", .y))) 
  )
#> Error in paste0(col, names_sep, names(x)): cannot coerce type 'closure' to vector of type 'character'

# names_repair is called after they've been combined, so I don't have access to both inner/outer names
df %>%
  unpack(
    c(y, z),
    names_repair = ~ifelse(duplicated(.), paste0(., seq_along(.)), .)
  )
#> # A tibble: 3 × 5
#>       x  coef    se coef4   se5
#>   <int> <int> <int> <int> <int>
#> 1     1     4     7    10    13
#> 2     2     5     8    11    14
#> 3     3     6     9    12    15

# Gets desired result, but I wish I didn't have to  rely on string parsing after combining it 
# with `names_sep`
df %>%
  unpack(
    c(y, z), 
    names_sep = "_", 
    names_repair = ~stringr::str_replace(., "_coef$", "")
  )
#> # A tibble: 3 × 5
#>       x     y  y_se     z  z_se
#>   <int> <int> <int> <int> <int>
#> 1     1     4     7    10    13
#> 2     2     5     8    11    14
#> 3     3     6     9    12    15
DavisVaughan commented 3 years ago

I think rename_with() is a decent way to follow up the unpack() to get what you want.

library(tidyverse)

df <- tibble(
  x = 1:3, 
  y = data.frame(coef = 4:6, se = 7:9), 
  z = data.frame(coef = 10:12, se = 13:15)
)

df %>%
  unpack(c(y, z), names_sep = "_") %>%
  rename_with(~sub("_coef", "", .x), ends_with("_coef"))
#> # A tibble: 3 x 5
#>       x     y  y_se     z  z_se
#>   <int> <int> <int> <int> <int>
#> 1     1     4     7    10    13
#> 2     2     5     8    11    14
#> 3     3     6     9    12    15

But maybe we can make names_sep more like vctrs::vec_c(.name_spec =), which does allow a function of 2 args to do the outer/inner name handling. That seems somewhat reasonable at first thought.

DavisVaughan commented 3 years ago

@hadley, do you think this names_spec proposal is worth it?

Proposal

To solve the above issue,pack() and unpack() could get a names_spec argument that is similar to the one in vec_c(). It allows:

names_spec would be mutually exclusive with names_sep.

In the unpack() case, names_sep would become a shortcut for names_spec = "{outer}<names_sep>{inner}". The pack() case is more complicated, because names_sep is interpreted as something to strip out of the inner names.

This argument would bubble up into other functions that utilize pack/unpack() as follows:

pack()

unpack()

nest()

unnest()

unnest_longer()

unnest_wider()

hadley commented 3 years ago

Could we just use names_glue? The full spec you propose feels too complicated for this fairly rare need.

DavisVaughan commented 3 years ago

I don't think names_glue would solve the original issue, which is to go from:

# from
outer <- "y"
inner <- c("coef", "se")

# to
c("y", "y_se")

# i.e. coef is dropped entirely, and the inner names are only partially used

But maybe this is just too rare of an issue, and we shouldn't support "partial usage" of the inner names like this.

Also, names_glue probably doesn't make that much sense for pack() and nest()? Because names_sep is more about removing a common prefix in the context of those functions.

I do think names_sep currently solves most of the issues, so maybe we can close this.

gergness commented 3 years ago

It's ugly, but names_glue could solve the original issue.

glue::glue(
  "{outer}{ifelse(inner == 'coef', '', paste0('_', inner))}", 
  outer = "y", 
  inner = c("coef", "se")
)
#> y
#> y_se

Don't disagree that this is niche though. FWIW, here's where it came up https://github.com/gergness/srvyr/blob/main/R/summarise.r#L44

DavisVaughan commented 3 years ago

Depends on how restrictive we make names_glue. I personally think its only intended purpose is for inlining {outer} and {inner} and everything else should be disabled (because it can get ugly like this, or can change the size of the return value). So I'd probably code it like:

glue::glue_data(
  .x = list(outer = "y", inner = c("coef", "se")),
  "{outer}{inner}", 
  .envir = NULL
)
#> ycoef
#> yse

glue::glue_data(
  .x = list(outer = "y", inner = c("coef", "se")),
  "{outer}{ifelse(inner == 'coef', '', paste0('_', inner))}",
  .envir = NULL
)
#> Error in ifelse(inner == "coef", "", paste0("_", inner)): could not find function "ifelse"

Created on 2021-11-10 by the reprex package (v2.0.1)

gergness commented 3 years ago

Oh, gotcha. Well, then yeah, doesn't help my situation any more than names_sep.

hadley commented 3 years ago

@gergness I think it's probably going to be too much work to fix this edge case. So I think you'll just need to use @DavisVaughan's work around with rename_with().