tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.78k stars 2.12k forks source link

filter and tidyevaluation behaves weirdly #6487

Closed yuliaUU closed 2 years ago

yuliaUU commented 2 years ago

I am a bit confused with teh following outputs of filter function and {{}} syntax to reproduce the issue:

library(gapminder)
library(tidyverse)
  1. case 1: this code should not run as quoted column name does not work with {{}}:

Yes here we are passing the quoted column name YES FAIL

filter_gap <- function(col, val) { filter(gapminder, {{col}} == val) } filter_gap("country", "Canada")


2. case 2: making variable <-"country"
 Here too we are passing the quoted column name , but runs I expect not to run

filter_gap <- function(col, val) { filter(gapminder, {{col}} == val) } country <- "country" filter_gap(country, "Canada")

and outputs   A tibble: 12 × 6

but if I replace `country <- "country"` with `xx <- "country"`

filter_gap <- function(col, val) {

  1. case 3: print() inside function

this function works as expected:

filter_gap <- function(col, val) {

  filter(gapminder, {{col}} == val)
}
filter_gap(country, "Canada")

but if I add print(col) inside:

filter_gap <- function(col, val) {
  print(col)
  filter(gapminder, {{col}} == val)
}
filter_gap(country, "Canada")

this no longer works, and I have no idea why

hadley commented 2 years ago

I think you can understand all these by thinking what happens without tidyeval:

  1. filter(gapminder, "country" == "Canada") is legitimate, if not very useful code
  2. country <- "country"; filter(gapminder, country == "Canada") works because dplyr always prefers variables in the data to variables in the environment.
  3. Just typing print(country) at console doesn't work either.
yuliaUU commented 2 years ago

regarding3) my confusion is in print() does work: and it prints the name of the column, where I would expect it to give me error ( as you said Just typing print(country) at console doesn't work) image

and because print() somehow prints the column name: the next line stopped working ( it no longer filters any rows with "Canada":- you can see the output is empty dataframe. so I assume it somehow starts thinking of country as "country" so result is identical to 1.

hadley commented 2 years ago

Here's a proper reprex:

library(dplyr, warn.conflicts = FALSE)

filter_gap <- function(col, val) {
  print(col)
  filter(gapminder::gapminder, {{col}} == val)
}
filter_gap(country, "Canada")
#> Error in print(col): object 'country' not found

Created on 2022-09-29 with reprex v2.0.2

If you have country lying from earlier it will "work":

library(dplyr, warn.conflicts = FALSE)

country <- "country"
filter_gap <- function(col, val) {
  print(col)
  filter(gapminder::gapminder, {{col}} == val)
}
filter_gap(country, "Canada")
#> [1] "country"
#> # A tibble: 0 × 6
#> # … with 6 variables: country <fct>, continent <fct>, year <int>,
#> #   lifeExp <dbl>, pop <int>, gdpPercap <dbl>

Created on 2022-09-29 with reprex v2.0.2

The problem here is because you've forced the evaluation of col, you've lost the information that tidyevaluation would use to do variable substitution, so it just inserts the value of col, "country".

yuliaUU commented 2 years ago

oh! now i see! thank you for the explanations!