tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

Surprising difference in tidy evaluation between select and count #4264

Closed slyrus closed 5 years ago

slyrus commented 5 years ago

When I try to use !! to unquote (evaluate?) a variable, things work as expected with dplyr::select, however when I try to use dplyr::count, it counts not the elements of the column in my tibble but rather the string itself, as shown in the example below:

library(magrittr)
library(tibble)
library(dplyr)

foo <- tibble::tibble(Group = c("A", "B", "A", "B"),
                      Level = seq(1,4))
var <- c("Group")

foo %>% dplyr::select(Group)
foo %>% dplyr::select(!!var)

foo %>% dplyr::count(Group)
foo %>% dplyr::count(!!var)

This yields:

> foo %>% dplyr::select(Group)
# A tibble: 4 x 1
  Group
  <chr>
1 A    
2 B    
3 A    
4 B    
> foo %>% dplyr::select(!!var)
# A tibble: 4 x 1
  Group
  <chr>
1 A    
2 B    
3 A    
4 B    
> foo %>% dplyr::count(Group)
# A tibble: 2 x 2
  Group     n
  <chr> <int>
1 A         2
2 B         2
> foo %>% dplyr::count(!!var)
# A tibble: 1 x 2
  `"Group"`     n
  <chr>     <int>
1 Group         4

I don't want to count the string "Group" but rather the elements of the Group column.

cderv commented 5 years ago

This is an expected behavior here. You need to read more about quoting and unquoting in tidyeval. You need to use var <- quo(Group) or var <- sym("Group") here. var <- c("Group") means !!var will be evaluated to "Group" and count() with a character as argument gives you the result you see.

library(magrittr)
library(tibble)
library(dplyr)
foo <- tibble::tibble(Group = c("A", "B", "A", "B"),
                      Level = seq(1,4))

# use quo to quote variable, not `""` 
var <- quo(Group)
# this works as expected
foo %>% dplyr::count(!!var)
#> # A tibble: 2 x 2
#>   Group     n
#>   <chr> <int>
#> 1 A         2
#> 2 B         2

select() works as you want but it is a selecting action, and not a doing action. It is why it works a bit differently. count() is like group_by() it is a doing verb, not selecting. See this talk at RStudio Conf to get more details.

foo %>% dplyr::select("Group")
#> # A tibble: 4 x 1
#>   Group
#>   <chr>
#> 1 A    
#> 2 B    
#> 3 A    
#> 4 B
foo %>% dplyr::count("Group")
#> # A tibble: 1 x 2
#>   `"Group"`     n
#>   <chr>     <int>
#> 1 Group         4
#
# this is because count is group_by + tally, 
# and group_by with character is the result you have
foo %>% 
  dplyr::group_by("Group") %>%
  dplyr::tally()
#> # A tibble: 1 x 2
#>   `"Group"`     n
#>   <chr>     <int>
#> 1 Group         4

Here some illustration of tidyeval behaviour

# this is tidyeval behaviour
var <- c("Group")
rlang::qq_show(!!var)
#> "Group"
var <- quo(Group)
rlang::qq_show(!!var)
#> ^Group
var <- sym("Group")
rlang::qq_show(!!var)
#> Group

You can also use string with *_at variant, meaning you don't need tidyeval:

library(magrittr)

foo <- tibble::tibble(Group = c("A", "B", "A", "B"),
                      Level = seq(1,4))

# use quo to quote variable, not `""` 
var <- c("Group")
foo %>% 
  dplyr::group_by_at(var) %>%
  dplyr::tally()
#> # A tibble: 2 x 2
#>   Group     n
#>   <chr> <int>
#> 1 A         2
#> 2 B         2

Hope it helps.

slyrus commented 5 years ago

Thanks for the clarification!

lock[bot] commented 5 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/