markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
449 stars 33 forks source link

Pass a pre-made list of functions to `across()` #747

Closed tungttnguyen closed 1 year ago

tungttnguyen commented 1 year ago

Hello Mark,

I couldn't get the following example to work with tidytable. It worked with dplyr so I am not sure what the problem was.

library(dplyr)

packageVersion("dplyr")
#> [1] '1.1.0'
library(purrr)
library(tidytable)

packageVersion("tidytable")
#> [1] '0.10.0'

data("mtcars")
mtcars <- mtcars %>% 
  select(cyl, gear, mpg)

Define a list of functions

p <- seq(0.1, 0.9, by = 0.1)
p_names <- map_chr(p, ~ paste0(.x * 100))
p_names 

p_funs <- map(p, ~ partial(quantile, type = 6, probs = .x, na.rm = TRUE)) %>%
  set_names(nm = p_names)

dplyr

quantile_dplyr <- mtcars %>% 
  dplyr::group_by(cyl, gear) %>%
  dplyr::summarise(across(where(is.numeric), p_funs)) %>% 
  ungroup()
#> `summarise()` has grouped output by 'cyl'. You can override using the `.groups`
#> argument.
quantile_dplyr
#> # A tidytable: 8 × 11
#>     cyl  gear mpg_10 mpg_20 mpg_30 mpg_40 mpg_50 mpg_60 mpg_70 mpg_80 mpg_90
#>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1     4     3   21.5   21.5   21.5   21.5   21.5   21.5   21.5   21.5   21.5
#> 2     4     4   21.4   22.5   22.8   23.8   25.8   28.5   31     32.7   33.9
#> 3     4     5   26     26     26     26.9   28.2   29.5   30.4   30.4   30.4
#> 4     6     3   18.1   18.1   18.1   18.8   19.8   20.7   21.4   21.4   21.4
#> 5     6     4   17.8   17.8   18.5   19.2   20.1   21     21     21     21  
#> 6     6     5   19.7   19.7   19.7   19.7   19.7   19.7   19.7   19.7   19.7
#> 7     8     3   10.4   12.1   14.2   14.8   15.2   15.4   16.5   17.9   19.0
#> 8     8     5   15     15     15     15.2   15.4   15.6   15.8   15.8   15.8

tidytable

quantile_tidy1 <- mtcars %>% 
  summarize(across(where(is.numeric), list(p_funs)),
            .by = c(cyl, gear))
#> Error in p_funs(mpg): could not find function "p_funs"
quantile_tidy1
#> Error in eval(expr, envir, enclos): object 'quantile_tidy1' not found

quantile_tidy2 <- mtcars %>% 
  group_by(cyl, gear) %>% 
  summarize(across(where(is.numeric), p_funs))
#> Error in p_funs(mpg): could not find function "p_funs"
quantile_tidy2
#> Error in eval(expr, envir, enclos): object 'quantile_tidy2' not found

Created on 2023-03-10 with reprex v2.0.2

markfairbanks commented 1 year ago

As tidytable is currently constructed you have to pass the list() call inside of across() to work.

pacman::p_load(tidytable)

df <- tidytable(x = 1:3, y = 4:6)

# Works
df %>%
  summarize(across(everything(), list(mean = mean, max = max)))
#> # A tidytable: 1 × 4
#>   x_mean x_max y_mean y_max
#>    <dbl> <int>  <dbl> <int>
#> 1      2     3      5     6

# Fails
fn_list <- list(mean = mean, max = max)
df %>%
  summarize(across(everything(), fn_list))
#> Error in fn_list(x): could not find function "fn_list"

I'll take a look and see if I can get it to work more like dplyr.

tungttnguyen commented 1 year ago

As tidytable is currently constructed you have to pass the list() call inside of across() to work.

pacman::p_load(tidytable)

df <- tidytable(x = 1:3, y = 4:6)

# Works
df %>%
  summarize(across(everything(), list(mean = mean, max = max)))
#> # A tidytable: 1 × 4
#>   x_mean x_max y_mean y_max
#>    <dbl> <int>  <dbl> <int>
#> 1      2     3      5     6

# Fails
fn_list <- list(mean = mean, max = max)
df %>%
  summarize(across(everything(), fn_list))
#> Error in fn_list(x): could not find function "fn_list"

I'll take a look and see if I can get it to work more like dplyr.

Thanks Mark!

markfairbanks commented 1 year ago

I took a look at this - basically the change required to make this work would break some really common use cases.

Here are a couple examples (that currently work correctly in tidytable):

# Using another column within the function
mutate(
  across(c(x, y), ~ .x + y)
)

# Using context functions like n()/row_number()
mutate(
  across(c(x, y), ~ .x + n())
)

There are a couple other use cases that wouldn't work as well.

Passing a pre-made list of functions to across() is going to have to be one of those things that is a limit of tidytable.