[x] Search for duplicates among the existing issues (both open and closed).
Proposal
Currently, the value (and left/right) argument in column comparison validation functions uses vars() to resolve scope ambiguity - values passed to value are evaluated ordinarily unless protected by vars(), in which case it's resolved to a column vector instead.
I think this makes it a prime candidate for integrating tidyeval, which resolves this same scope ambiguity with .data and .env.
Pros
We can start the deprecation of vars(col) syntax, encouraging users to instead use the .data$col pronoun instead.
Defusing a string variable to select a column is now more straightforward: vars(!!col_chr) becomes .data[[col_chr]]
Allows building up more complex expressions like value = .data$value * .env$multiplier
Allows tidyselect via the bridge pattern using pick(), like value = pick(any_of("a", "z"))
Perhaps the biggest PRO of all, this is the familiar and idiomatic dplyr way to do column comparisons dynamically. For example, the col_vals_gt()-equivalent logic expressed inside dplyr::filter() would look like the following:
library(dplyr)
b <- 0
df <- tibble(a = 1:2, b = 1)
# data scoping
df %>%
filter(a > .data$b)
#> # A tibble: 1 × 2
#> a b
#> <int> <dbl>
#> 1 2 1
# env scoping
df %>%
filter(a > .env$b)
#> # A tibble: 2 × 2
#> a b
#> <int> <dbl>
#> 1 1 1
#> 2 2 1
# tidyselect bridge pattern with `pick()` + simplify to vector with `pull()`
df %>%
filter(a > pull(pick(any_of(c("b", "z")))))
#> # A tibble: 1 × 2
#> a b
#> <int> <dbl>
#> 1 2 1
Cons
1) The biggest hurdle for a tidyeval implementation is that the current masking pattern in value is the other way around compared to in tidyeval: the environment masks the data. This makes it difficult to recycle the existing infrastructure for tidyeval.
Instead, this might require implementing our own "poorman's tidyeval". I've attached code for a toy implementation at the end. It's probably not super ideal to introduce a lot of metaprogramming shenanigans, so I'm also very open to simplifying this (ex: we don't _have_ to support bridge pattern with `pick()`).
2) The fact that vars("x") gets deprecated in favor of .data$x (or .data[["x"]]) in value may be confusing when combined with (#493), because in the context of tidyselect, vars("x") would get deprecated in favor of all_of("x") in columns. One solution is to not necessarily signal the deprecation of vars(x) as used in the values argument - we can simply document the possibility of passing more complex expressions to value by using the .data and .env pronouns and leave it for the advanced users to discover for themselves.
3) This may pose some limitations for the yaml round-trip. I don't know to what extent there exists a "pointblank guarantee" for this, but we can always be stricter about the input.
Mockup implementation of `values_tidyeval()` (with messages for debugging)
```r
# `vars(col)` and `vars("col")` translated to `.data$col`
vars_as_dotdata <- function(col, ...) {
rlang::check_dots_empty0()
col_sym <- rlang::ensym(col)
eval_expr <- rlang::data_sym(col_sym)
message("Translated: ", deparse(eval_expr))
eval.parent(eval_expr)
}
# tidyselect bridge pattern support
pick_pull <- function(col) {
data_cols <- eval.parent(quote(names(.data)))
col_chr <- rlang::inject(tidyselect::vars_pull(data_cols, {{ col }}))
eval_expr <- rlang::data_sym(col_chr)
message("Translated: ", deparse(eval_expr))
eval.parent(eval_expr)
}
values_tidyeval <- function(x, tbl) {
x_quo <- rlang::enquo(x)
message("Received: ", rlang::expr_deparse(rlang::quo_get_expr(x_quo)))
if (!rlang::quo_is_call(x_quo)) {
# Bare symbols and constants are evaluated in the env they come from
rlang::eval_tidy(x_quo)
} else {
# If value is complex expression, evaluate in a "tidyeval" context
## 1) Extract expression and environment from quosure
x_expr <- rlang::quo_get_expr(x_quo)
x_env <- rlang::quo_get_env(x_quo)
## 2) Create a masking environment and introduce pronouns
x_masked_env <- rlang::new_environment(
data = list(.data = tbl, .env = x_env),
parent = x_env
)
## 3) Replace `vars()` and `pick()` with custom implementation
## - inner substitute to mask vars/pick
## - outer substitute to unquote x_expr
eval_expr <- eval(substitute(substitute(
x_expr,
list(vars = quote(vars_as_dotdata), pick = quote(pick_pull))
)))
message("Substituted: ", deparse(eval_expr))
eval(eval_expr, x_masked_env)
}
}
```
Edit: recycling eval_tidy()
Just for completeness, it turns out that it is also possible to simply swap the data and env roles and still use eval_tidy(). But this is purely an implementational issue that can be decided on later.
Mockup implementation of `reverse_tidyeval()` that still uses `rlang::eval_tidy()`
```r
reverse_tidyeval <- function(x, tbl = head(mtcars)) {
# 1) Capture expression as quosure
x_quo <- rlang::enquo(x)
# 2) Extract quosure components
x_expr <- rlang::quo_get_expr(x_quo)
x_env <- rlang::quo_get_env(x_quo)
# 3) Reverse `.data` and `.env` in expression
x_expr_reversed <- eval(substitute(substitute(
x_expr, rlang::exprs(.data = .env, .env = .data)
)))
# 4) Reverse data and env roles
data_as_env <- rlang::new_environment(data = tbl)
## env-as-data only grabs the variables used in expression
env_as_data <- rlang::env_get_list(
x_env, nms = all.names(x_expr, unique = TRUE),
default = NULL, inherit = TRUE
)
# 5) Eval
rlang::eval_tidy(x_expr_reversed, env_as_data, data_as_env)
}
# New .data syntax for vars()
reverse_tidyeval(.data$mpg)
#> [1] 21.0 21.0 22.8 21.4 18.7 18.1
hp <- 0
# Default is environment scoping
reverse_tidyeval(.data$mpg * hp)
#> [1] 0 0 0 0 0 0
# .env pronoun makes this explicit
reverse_tidyeval(.data$mpg * .env$hp)
#> [1] 0 0 0 0 0 0
# .data to scope in data
reverse_tidyeval(.data$mpg * .data$hp)
#> [1] 2310.0 2310.0 2120.4 2354.0 3272.5 1900.5
# bang-bang to inline values defined in a different scope
local({
hp <- -1
reverse_tidyeval(.data$mpg * !!hp)
})
#> [1] -21.0 -21.0 -22.8 -21.4 -18.7 -18.1
```
Prework
Proposal
Currently, the
value
(andleft
/right
) argument in column comparison validation functions usesvars()
to resolve scope ambiguity - values passed tovalue
are evaluated ordinarily unless protected byvars()
, in which case it's resolved to a column vector instead.I think this makes it a prime candidate for integrating tidyeval, which resolves this same scope ambiguity with
.data
and.env
.Pros
vars(col)
syntax, encouraging users to instead use the.data$col
pronoun instead.vars(!!col_chr)
becomes.data[[col_chr]]
value = .data$value * .env$multiplier
pick()
, likevalue = pick(any_of("a", "z"))
Perhaps the biggest PRO of all, this is the familiar and idiomatic dplyr way to do column comparisons dynamically. For example, the
col_vals_gt()
-equivalent logic expressed insidedplyr::filter()
would look like the following:Cons
1) The biggest hurdle for a tidyeval implementation is that the current masking pattern in
value
is the other way around compared to in tidyeval: the environment masks the data. This makes it difficult to recycle the existing infrastructure for tidyeval.2) The fact that
vars("x")
gets deprecated in favor of.data$x
(or.data[["x"]]
) invalue
may be confusing when combined with (#493), because in the context of tidyselect,vars("x")
would get deprecated in favor ofall_of("x")
incolumns
. One solution is to not necessarily signal the deprecation ofvars(x)
as used in thevalues
argument - we can simply document the possibility of passing more complex expressions tovalue
by using the.data
and.env
pronouns and leave it for the advanced users to discover for themselves.3) This may pose some limitations for the yaml round-trip. I don't know to what extent there exists a "pointblank guarantee" for this, but we can always be stricter about the input.
Proposed behavior:
Mockup implementation of `values_tidyeval()` (with messages for debugging)
```r # `vars(col)` and `vars("col")` translated to `.data$col` vars_as_dotdata <- function(col, ...) { rlang::check_dots_empty0() col_sym <- rlang::ensym(col) eval_expr <- rlang::data_sym(col_sym) message("Translated: ", deparse(eval_expr)) eval.parent(eval_expr) } # tidyselect bridge pattern support pick_pull <- function(col) { data_cols <- eval.parent(quote(names(.data))) col_chr <- rlang::inject(tidyselect::vars_pull(data_cols, {{ col }})) eval_expr <- rlang::data_sym(col_chr) message("Translated: ", deparse(eval_expr)) eval.parent(eval_expr) } values_tidyeval <- function(x, tbl) { x_quo <- rlang::enquo(x) message("Received: ", rlang::expr_deparse(rlang::quo_get_expr(x_quo))) if (!rlang::quo_is_call(x_quo)) { # Bare symbols and constants are evaluated in the env they come from rlang::eval_tidy(x_quo) } else { # If value is complex expression, evaluate in a "tidyeval" context ## 1) Extract expression and environment from quosure x_expr <- rlang::quo_get_expr(x_quo) x_env <- rlang::quo_get_env(x_quo) ## 2) Create a masking environment and introduce pronouns x_masked_env <- rlang::new_environment( data = list(.data = tbl, .env = x_env), parent = x_env ) ## 3) Replace `vars()` and `pick()` with custom implementation ## - inner substitute to mask vars/pick ## - outer substitute to unquote x_expr eval_expr <- eval(substitute(substitute( x_expr, list(vars = quote(vars_as_dotdata), pick = quote(pick_pull)) ))) message("Substituted: ", deparse(eval_expr)) eval(eval_expr, x_masked_env) } } ```Edit: recycling
eval_tidy()
Just for completeness, it turns out that it is also possible to simply swap the data and env roles and still use
eval_tidy()
. But this is purely an implementational issue that can be decided on later.Mockup implementation of `reverse_tidyeval()` that still uses `rlang::eval_tidy()`
```r reverse_tidyeval <- function(x, tbl = head(mtcars)) { # 1) Capture expression as quosure x_quo <- rlang::enquo(x) # 2) Extract quosure components x_expr <- rlang::quo_get_expr(x_quo) x_env <- rlang::quo_get_env(x_quo) # 3) Reverse `.data` and `.env` in expression x_expr_reversed <- eval(substitute(substitute( x_expr, rlang::exprs(.data = .env, .env = .data) ))) # 4) Reverse data and env roles data_as_env <- rlang::new_environment(data = tbl) ## env-as-data only grabs the variables used in expression env_as_data <- rlang::env_get_list( x_env, nms = all.names(x_expr, unique = TRUE), default = NULL, inherit = TRUE ) # 5) Eval rlang::eval_tidy(x_expr_reversed, env_as_data, data_as_env) } # New .data syntax for vars() reverse_tidyeval(.data$mpg) #> [1] 21.0 21.0 22.8 21.4 18.7 18.1 hp <- 0 # Default is environment scoping reverse_tidyeval(.data$mpg * hp) #> [1] 0 0 0 0 0 0 # .env pronoun makes this explicit reverse_tidyeval(.data$mpg * .env$hp) #> [1] 0 0 0 0 0 0 # .data to scope in data reverse_tidyeval(.data$mpg * .data$hp) #> [1] 2310.0 2310.0 2120.4 2354.0 3272.5 1900.5 # bang-bang to inline values defined in a different scope local({ hp <- -1 reverse_tidyeval(.data$mpg * !!hp) }) #> [1] -21.0 -21.0 -22.8 -21.4 -18.7 -18.1 ```