tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

Unexpected behaviour of missing `{{var}}` in `across()`, `arrange()`, `group_by()`, etc. #6093

Closed zenggyu closed 2 years ago

zenggyu commented 2 years ago

When used without tidyeval, across(), arrange(), group_by() can execute without a problem even if no arguments are passed to them. One would expect the same behavior with tidyeval, but this is not the case. Reprex below (also note that in the case of arrange(), the error messages are different in the two function calls, and it seems inconsistent):

suppressPackageStartupMessages(library(dplyr))

dat <- tibble(x = 1, y = 1)

f1 <- function(.data, order_by) {
  .data %>%
    arrange({{order_by}})
}

f2 <- function(.data, order_by) {
  .data %>%
    arrange(across({{order_by}}))
}

g1 <- function(.data, group_by) {
  .data %>%
    group_by({{group_by}})
}

g2 <- function(.data, group_by) {
  .data %>%
    group_by(across({{group_by}}))
}

dat %>% # works fine
  arrange()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

dat %>% # works fine
  arrange(across())
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

dat %>% # works fine
  group_by()
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

dat %>% # works fine
  group_by(across())
#> # A tibble: 1 × 2
#> # Groups:   x, y [1]
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

dat %>% # does not work
  f1()
#> Error in mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE): zero-length inputs cannot be mixed with those of non-zero length

dat %>% # does not work
  f2()
#> Error in order(vec_order(proxy, direction = direction, na_value = if (desc) "smallest" else "largest")): argument 1 is not a vector

dat %>% # does not work
  g1()
#> Error in group_by(., {: argument "group_by" is missing, with no default

dat %>% # does not work
  g2()
#> Error in group_by(., across({: argument "group_by" is missing, with no default

Created on 2021-11-18 by the reprex package (v2.0.1)

DavisVaughan commented 2 years ago

I think the main problem is that calling arrange() directly with no dots is different from calling arrange() while passing through a {{var}} that evaluates to a missing argument. i.e. these are different

fake_arrange <- function(...) {
  rlang::enquos(...)
}

fake_arrange_wrapper <- function(x) {
  fake_arrange({{x}})
}

fake_arrange()
#> <list_of<quosure>>
#> 
#> named list()

fake_arrange_wrapper()
#> <list_of<quosure>>
#> 
#> [[1]]
#> <quosure>
#> expr: ^
#> env:  empty

If you are passing to arrange() directly, I'd just pass through the dots. So f1() and g1() become:

suppressPackageStartupMessages(library(dplyr))

dat <- tibble(x = 1, y = 1)

f1 <- function(.data, ...) {
  .data %>%
    arrange(...)
}

g1 <- function(.data, ...) {
  .data %>%
    group_by(...)
}

f1(dat)
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

g1(dat)
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

If you want to use tidyselect through across() and if you really want to allow a missing order_by arg, then you should give it a default of c() to select nothing. Like this for g2() (note that I had to change the argument name to .group_by to not conflict with group_by())

suppressPackageStartupMessages(library(dplyr))

dat <- tibble(x = 1, y = 1)

g2 <- function(.data, .group_by = c()) {
  .data %>%
    group_by(across({{.group_by}}))
}

g2(dat)
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

g2(dat, x)
#> # A tibble: 1 × 2
#> # Groups:   x [1]
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

This doesn't quite work for f2()

suppressPackageStartupMessages(library(dplyr))

dat <- tibble(x = 1, y = 1)

f2 <- function(.data, .order_by = c()) {
  .data %>%
    arrange(across({{.order_by}}))
}

f2(dat)
#> Error in order(vec_order(proxy, direction = direction, na_value = if (desc) "smallest" else "largest")): argument 1 is not a vector

f2(dat, x)
#> # A tibble: 1 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1

This fails because of an edge case bug in the current version of vctrs::vec_order()

library(vctrs)

dat <- new_data_frame(n = 5L)
dat
#> data frame with 0 columns and 5 rows

# Should be c(1, 2, 3, 4, 5)
vctrs::vec_order(dat)
#> NULL