tidyverse / tidyeval

A guide to tidy evaluation
https://tidyeval.tidyverse.org
55 stars 21 forks source link

(en)quo(s) vs (en)sym(s) #22

Open jennybc opened 5 years ago

jennybc commented 5 years ago

Capture the essence of this twitter discussion somewhere:

https://twitter.com/JennyBryan/status/1088859123658018816

Summary: enquo() is a better all-purpose default quotation mechanism to recommend than ensym(). That is, if you're only going to learn 1 of these, make it enquo().

MilesMcBain commented 5 years ago

So I think what we thought we were buying with ensym() was some guarantee that the object would actually resolve to a column name or error.

We don't get that with enquo(), so it might also be worth discussing the pattern for getting that back. Unless I've missed it, I don't see an rlang function to test whether a quosure contains something that is a symbol. It's not is_symbol or is_symbolic.

Thinking one step further than this, if such a function does exist, and you were going to recommend it be combined with enquo(), I'd have to ask why not combine the two into one function? So it would be something like enquo_sym() that captures an expression and an environment, and errors if the expression is not a symbol.

jennybc commented 5 years ago

So I think what we thought we were buying with ensym() was some guarantee that the object would actually resolve to a column name or error.

We don't get that with enquo(), so it might also be worth discussing the pattern for getting that back.

Re: getting this guarantee "back". This property isn't born out by the use of either ensym() (or enquo()). But enquo() comes the closest, in the sense of having a more limited scope when the quoted user input is evaluated.

library(tidyverse)

summarise_ensym <- function(.data, summarise_col) {
  Spal.Length <- rep_len(0, nrow(iris))
  .data %>%
    group_by(Species) %>%
    summarise(
      avg = mean(!!ensym(summarise_col))
    )
}

summarise_enquo <- function(.data, summarise_col) {
  Spal.Length <- rep_len(0, nrow(iris))
  .data %>%
    group_by(Species) %>%
    summarise(
      avg = mean(!!enquo(summarise_col))
    )
}

## Same result when all is well, e.g. no unfortunate typos / name collisions
summarise_ensym(iris, Sepal.Length)
#> # A tibble: 3 x 2
#>   Species      avg
#>   <fct>      <dbl>
#> 1 setosa      5.01
#> 2 versicolor  5.94
#> 3 virginica   6.59

summarise_enquo(iris, Sepal.Length)
#> # A tibble: 3 x 2
#>   Species      avg
#>   <fct>      <dbl>
#> 1 setosa      5.01
#> 2 versicolor  5.94
#> 3 virginica   6.59

## evaluation of the `ensym()`d input happens with execution env in scope
summarise_ensym(iris, Spal.Length)
#> # A tibble: 3 x 2
#>   Species      avg
#>   <fct>      <dbl>
#> 1 setosa         0
#> 2 versicolor     0
#> 3 virginica      0

## not so with `enquo()`d input
summarise_enquo(iris, Spal.Length)
#> Error in ~Spal.Length: object 'Spal.Length' not found

## however both can still find the "wrong" object in global env
## although execution env is still consulted first for `ensym()`d user input
Spal.Length <- rep_len(50, nrow(iris))

summarise_ensym(iris, Spal.Length)
#> # A tibble: 3 x 2
#>   Species      avg
#>   <fct>      <dbl>
#> 1 setosa         0
#> 2 versicolor     0
#> 3 virginica      0

summarise_enquo(iris, Spal.Length)
#> # A tibble: 3 x 2
#>   Species      avg
#>   <fct>      <dbl>
#> 1 setosa        50
#> 2 versicolor    50
#> 3 virginica     50

Ptal.Width <- rep_len(1000, nrow(iris))
summarise_ensym(iris, Ptal.Width)
#> # A tibble: 3 x 2
#>   Species      avg
#>   <fct>      <dbl>
#> 1 setosa      1000
#> 2 versicolor  1000
#> 3 virginica   1000

summarise_enquo(iris, Ptal.Width)
#> # A tibble: 3 x 2
#>   Species      avg
#>   <fct>      <dbl>
#> 1 setosa      1000
#> 2 versicolor  1000
#> 3 virginica   1000

Created on 2019-01-28 by the reprex package (v0.2.1)

A related situation holds if these 2 functions are defined and exported in a package, i.e. now !!ensym(var) can resolve to something in the namespace environment, but !!enquo(var) does not.

lionel- commented 5 years ago

I don't see an rlang function to test whether a quosure contains something that is a symbol

It is quo_is_symbol().

So I think what we thought we were buying with ensym() was some guarantee that the object would actually resolve to a column name or error.

To get this guarantee you can use .data[[mycol]], or go through tidyselect.