r-lib / rlang

Low-level API for programming with R
https://rlang.r-lib.org
Other
501 stars 137 forks source link

Documentation question: how does one convert a string to a quosure? #116

Closed JohnMount closed 2 years ago

JohnMount commented 7 years ago

I have skimmed the dplyr/tidyeval/rlang documentation and tutorials and I don't remember or see how to convert a string to a quosure easily. What I want to do (and I think it is an important use case) is take the name of a column as a string from some external source (say from colnames(), or from the yarn-control block of an R-markdown document) and then use that string as a variable name. It looks like to do that you have to promote the string up to a quosure- and that is the part I don't know how to do in pure tidyeval idiom. I've tried things like quo(), but I am missing something.

Below is a specific example with a work-around that shows the effect I want. The only question is how does one produce the variable varQ from the value stored in varName (again, assuming the value stored is a string and not known to the programmer)?

# devtools::install_github("tidyverse/dplyr")
library("dplyr")
## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
packageVersion("dplyr")
## [1] '0.5.0.9004'
library("wrapr")

# imagine this string comes from somewhere else
varName <- 'disp' 
# The following does not currently work as
# rlang/tidyeval/dplyr is expecting a 
# quousure to represent the variable name,
# not a string.
mtcars %>% 
  select(!!varName)
## Error: `"disp"` must resolve to integer column positions, not string
# How does one idiomatically 
# create a quosure varQ that refers
# to the name stored in varName?
# Here is a work-around using wrapr
# to show the desired effect:
stringToQuoser <- function(varName) {
  wrapr::let(c(VARNAME = varName), quo(VARNAME))
}
varQ <- stringToQuoser(varName)

# the code we want to run:
mtcars %>% 
  select(!!varQ) %>%
  head()
##                   disp
## Mazda RX4          160
## Mazda RX4 Wag      160
## Datsun 710         108
## Hornet 4 Drive     258
## Hornet Sportabout  360
## Valiant            225
lionel- commented 7 years ago

you can use sym() or syms(). You can unquote these symbols directly in capturing functions:

select(df, !! sym("foo"))
select(df, !!! syms(c("foo", "bar")))

or you can unquote them while creating quosures (which is actually the same mechanism):

quo(!! sym("foo"))
quo(list(!!! syms(letters)))
JohnMount commented 7 years ago

Thanks, just a note: that doesn't work until you explicitly import rlang (dplyr seems to not share it).

library("dplyr") # installed today
packageVersion("dplyr")
# [1] ‘0.5.0.9004’
select(mtcars,  !! sym("disp"))
# Error in (function (x)  : could not find function "sym" 
library('rlang')
packageVersion("rlang")
# [1] ‘0.0.0.9018’
select(mtcars,  !! sym("disp"))
lionel- commented 7 years ago

You can qualify: rlang::sym()

JohnMount commented 7 years ago

Also I don't think varQ = quo(sym(varName)) works (and it is the bit I really need).

library("dplyr")
library("rlang")
varName <- 'disp'
varQ = quo(sym(varName))
varQ
# <quosure: global>
# ~sym(varName)
select(mtcars, varQ)
#  Error: `varQ` must resolve to integer column positions, not formula 

Even if that did work, it would be dangerous as it looks like it is hanging on a reference to varName (instead of taking the value at the current moment), meaning if we later changed the value of varQ the select could (do to lazy eval) become a different result (I call this a dragging reference). This drives bugs sort of like this one: https://github.com/tidyverse/dplyr/issues/2455 .

Can you please re-open this issue until we find a working pure rlang solution?

hadley commented 7 years ago

But you don't need to create a quosure here; just pass in a symbol (which you can also create with as.name() if you don't want to import rlang)

lionel- commented 7 years ago

When you're programming with NSE functions, you are building an expression. Unquoting makes it possible to change parts of the expression and is what a programmer should focus on:

varName <- "disp"
varQ <- quo(!! sym(varName))  # unquoting the symbol
select(mtcars, !! varQ)  # unquoting the quosure

Note that the quosure is superfluous here as Hadley mentions, since you're not referring to symbols from the contextual environment.

it would be dangerous as it looks like it is hanging on a reference to varName

I don't understand what you mean. In the snippet above we build a quosure containing the value of sym(varName) which is the symbol disp. Note that you can also unquote literal values (i.e. vectors) just like you can unquote expressions (though in this case select_vars() expects column positions rather than actual columns; it evaluates expressions in an environment where column symbols evaluate to column positions).

MZLABS commented 7 years ago

John Mount here again (different account, sorry).

I see the later solutions do work, and than you for that. I also understand answering questions is a volunteer activity, so I appreciate you working on this for me.

I had a typo in my attempt to use the answer, and I apologize for that. But variations on that idea do not work even if I type it "correctly":

library("dplyr")
library("rlang")
varName <- 'disp'
varQ = quo(sym(varName))
select(mtcars, !! varQ)
# Error: `sym(varName)` must resolve to integer column positions, not symbol

To answer some expressed concerns and set some context I have some follow-up, but all my questions are now answered.

As far as the appearance of unbound variables and CRAN check. If I had been submitting the above code to CRAN I would have added the following line above the let-block:

VARNAME <- NULL # mark variable symbol as not an unbound reference

The reason I asked for "quosure" is: I thought that was all rlang accepted. I had tested it does not take strings and base-R formulas, so I was just asking for what I thought was the help I needed based on my state of knowledge.

I do not mind importing rlang or qualifying rlang::sym, I only mentioned that dplyr did not re-export rlang::sym to point out the difficulty in discovering that command starting from a dplyr task (i.e. working on a select()).

The latest solution does work (and thank you for it):

library("dplyr")
library("rlang")
varName <- 'disp'
varQ = quo(!! sym(varName))
mtcars %>%
  select(!! varQ) %>% 
  head()
#                  disp
#Mazda RX4          160
#Mazda RX4 Wag      160
#Datsun 710         108
#Hornet 4 Drive     258
#Hornet Sportabout  360
#Valiant            225

It also appears to not have the captured reference name issue. Since the captured reference issue is not in this variation (there is no visible reference to the variable name "varName" in "varQ") there isn't much point going more into it. But the rough idea is: if a quosure worked by capturing varName and varName changed between when we thought we set it and when we used the value we would not want the new value to enter into the calculation. The effect (albeit in another context) was discussed at length in my linked issue https://github.com/tidyverse/dplyr/issues/2455 which was supposed to be a worked example by analogy. But as I said, I don't think the current solutions hold a reference to the varName (they seem to properly go after the value) so we don't have to worry if we have this issue or not. Typically R doesn't have these issues due to its copy by value semantics, but anywhere names are directly used I worry about possible (unintended, and therefore undesirable) reference-like semantics leaking in.

Also I understand sym() this is not the form you would prefer and that as.name() can be used.

library("dplyr")
varName <- 'disp'
varQ = as.name(varName)
mtcars %>%
  select(!! varQ) %>% 
  head()
#                  disp
#Mazda RX4          160
#Mazda RX4 Wag      160
#Datsun 710         108
#Hornet 4 Drive     258
#Hornet Sportabout  360
#Valiant            225

Finally I did work on this before asking. I tried all of:

varQ = as.formula(~varName)
varQ = quo(!! varName)
varQ = quote(varName)
varQ = quote(!! varName)

And none of the above worked.

Frankly I was guessing, but the reason I was guessing is I did not find a worked example of converting a string to something rlang is willing to use as variable name. That is: guessing was not my first choice.

I had tried help(quo), help(UQ) (!!'s equiv) and neither of them mentions, links to, or has an example of sym() or as.name().

Anyway thank you very much for your solutions.

hadley commented 7 years ago

You can also use the .data pronoun to avoid both R CMD check notes and the need to convert to symbols:

varName <- "disp"
mtcars %>% select(.data["disp"])

(Well, you will be able to once https://github.com/tidyverse/dplyr/pull/2718 is merged)

lionel- commented 7 years ago

I worry about possible (unintended, and therefore undesirable) reference-like semantics leaking in.

It's not about reference semantics, it's about delayed evaluation. We're building an expression, sometimes in several steps, and if the value of some symbols changes before evaluation actually happens this could be a problem. To work around this, you can unquote values rather than symbols, or you could make sure the symbols are in read-only environments (e.g. by building an appropriate quosure).

I had tested it does not take strings and base-R formulas

tidyeval works with pure expressions. The only adjustment we make is that quosures self-evaluate within their environments (with overscoped data attached). This is why the following expressions are completely equivalent:

select(mtcars, "cyl")

var <- "cyl"
select(mtcars, !! var)

This doesn't work because select() doesn't work with strings but with column positions (or with expressions evaluating to column positions). Hence the following works:

select(mtcars, cyl)

var <- sym("cyl")
select(mtcars, !! var)

Alternatively, you can also supply the values it understands (column positions):

select(mtcars, 1)

var <- 1
select(mtcars, !! var)
yeedle commented 7 years ago

It seems that if an expression (as opposed to a column name) is passed as a string the above solutions do not work. consider:

count(mtcars, !!! rlang::syms(c("2 * cyl", "am")))
#> Error in grouped_df_impl(data, unname(vars), drop): Column `2 * cyl` is unknown

While this works:

count(mtcars, !!! rlang::syms(c("cyl", "am")))
#> # A tibble: 6 x 3
#>     cyl    am     n
#>   <dbl> <dbl> <int>
#> 1     4     0     3
#> 2     4     1     8
#> 3     6     0     4
#> 4     6     1     3
#> 5     8     0    12
#> 6     8     1     2

I must be missing something but I'm not sure what.

hadley commented 7 years ago

mtcars doesn't have a column called 2 * cyl ?

yeedle commented 7 years ago

Right. It does not. But count (like group_by) does not require a column name

count(mtcars, 2 * cyl, am)
#> # A tibble: 6 x 3
#>   `2 * cyl`    am     n
#>       <dbl> <dbl> <int>
#> 1         8     0     3
#> 2         8     1     8
#> 3        12     0     4
#> 4        12     1     3
#> 5        16     0    12
#> 6        16     1     2

This even works with count_:

count_(mtcars, c("2 * cyl", "am"))
#> # A tibble: 6 x 3
#>   `2 * cyl`    am     n
#>       <dbl> <dbl> <int>
#> 1         8     0     3
#> 2         8     1     8
#> 3        12     0     4
#> 4        12     1     3
#> 5        16     0    12
#> 6        16     1     2

Can I get the same results if 2 * cyl is a string with tidyeval?

hadley commented 7 years ago

Yes, you need to parse it, not convert it to a string. I forget if rlang has an equivalent of parse(text = text)

lionel- commented 7 years ago

There is parse_expr(), parse_quo() which parse one expression, e.g. "foo(bar)" and the plural variants which parse a list of expressions, e.g. "foo(bar); baz".

shearerpmm commented 7 years ago

A couple SO answers which illustrate how one can use quosures and parse_quosure() to pass strings to dplyr, as we once did with the deprecated underscore verbs: https://stackoverflow.com/a/44594223/845800 https://stackoverflow.com/a/44593617/845800

garywhiteford commented 6 years ago

@hadley , including references to as.name() or sym() with examples in the dplyr programming vignette would be very helpful. I must have tried just about every permutation in the (0.7.3) vignette examples trying to get a code-generated string (in a Shiny app) to be a column name before I stumbled upon this thread. FWIW. (Will happily post this in whatever venue is more appropriate.)

lionel- commented 6 years ago

@gwhiteford-cwt we're working on a new vignette: http://rpubs.com/lionel-/programming-draft

lionel- commented 6 years ago

btw if you need a symbol don't use the parse_ functions, use sym() or syms().

tmastny commented 6 years ago

@lionel- Why what's the difference?

For example:

> var <- "Species"
> new_expr <- parse_expr(var)
> rlang::is_symbol(new_expr)
[1] TRUE
> new_sym <- rlang::sym(var)
> rlang::is_expr(new_sym)
[1] TRUE
> identical(new_sym, new_expr)
[1] TRUE

Where can I read about the difference?

lionel- commented 6 years ago

sym() will create non-syntactic symbols while parse_expr() will give an error because it tries to interpret the string as an R expression rather than an R symbol:

parse_expr("foo+")
#> Error in parse(text = x) : <text>:2:0: unexpected end of input
#> 1: foo+
#>    ^

sym("foo+")
#> `foo+`
adamryczkowski commented 6 years ago

@lionel- I think you should also include a reference to the rlang::parse_quo and the example of how to un-depreciate use of variable-held expressions in mutate_ or filter_. It is the third time I lost many minutes by forgetting this function, when modernizing (not that old) legacy dplyr code, like this one:

apply_filter<-function(db, filterstring) {
 return(dplyr::filter_(db, filterstring))
}

into this

apply_filter<-function(db, filterstring) {
     filterexpr<-rlang::parse_quosure(filterstring)
     return(dplyr::filter(db, !!filterexpr))
}
lionel- commented 6 years ago

note the old rlang::parse_quosure() function has been renamed to parse_quo() and now requires you to specify an environment. The old function would set the current env by default, which was a bug. It should at least be caller_env() (or parent.frame()), but if the string has been passed around several times you should pass the original environment alongside.

You are right we should document this process somewhere.

efh0888 commented 6 years ago

hey @lionel- noticed that there's no mention of sym() or syms() in the final vignette, so has the best practice changed? my use case is similar to @gwhiteford-cwt, i.e. allowing the user to choose which column to group_by() in a shiny app...

lionel- commented 6 years ago

If the symbols reference data frame columns, you can safely use sym() and syms(). Quosures are for complex expressions so dplyr can find where your functions are defined.

efh0888 commented 6 years ago

@lionel- thanks for clarifying!

RolandASc commented 6 years ago

edit: ok I guess I'm just repeating what was said a few comments above, although parse_expr seems simpler since it doesn't need env.

@lionel- I stumbled across this issue, so sorry for adding it here. I think it would be helpful if the programming vignette of dplyr could include an example illustrating something like below (i.e. a utility function that pastes together string conditions). This was straight-forward with SE semantics. Apols if you've covered this elsewhere.

cond <- "mass < 20"
dplyr::filter(dplyr::starwars, !!rlang::parse_expr(cond))

cond <- "mass > 1000"
dplyr::filter(dplyr::starwars, !!rlang::parse_expr(cond))

my_filter <- function(df, cond) {...}
lionel- commented 6 years ago

Agreed, filter() expressions should be a prominent example in the vignette. But it won't involve strings or parse_expr() because we believe meta-programming with strings is sloppy and not robust. Here is one way to do it with quasiquotation:

my_filter <- function(data, col, op, value) {
  op_sym <- sym(op)
  col_sym <- sym(col)

  cond <- expr((!!op_sym)(!!col_sym, !!value))
  filter(data, !!cond)
}
my_filter(starwars, "mass", "<", 20)
my_filter(starwars, "mass", ">", 1000)

This is compatible with op being any binary predicate function, not just a binary operator. Though there's a bug because we didn't unquote a quosure and local functions won't be available (functions in the global env and search path are always available tough). To fix this (which is important if you want my_filter() to be usable within packages) you just wrap cond in a quosure, which you can do manually with:

cond <- new_quosure(cond, parent.frame())

just before the filter() call.

lionel- commented 6 years ago

Though it's better to take the environment as argument in case your function is called from another function rather than by the user. So the complete solution is:

my_filter <- function(data, col, op, value, env = parent.frame()) {
  op_sym <- sym(op)
  col_sym <- sym(col)

  cond <- expr((!!op_sym)(!!col_sym, !!value))
  cond <- new_quosure(cond, env)

  filter(data, !!cond)
}
RolandASc commented 6 years ago

Ok, thanks, that's a helpful example

lionel- commented 6 years ago

And you can also build your expression with call() instead of quasiquotation. Then op is automatically converted to a symbol:

cond <- call(op, sym(col), value)

That does make things a bit simpler:

my_filter <- function(data, col, op, value, env = parent.frame()) {
  cond <- call(op, sym(col), value)
  cond <- new_quosure(cond, env)

  filter(data, !!cond)
}
RolandASc commented 6 years ago

yes I like that!

lionel- commented 2 years ago

Fixed by #1307. In particular, see:

shearerpmm commented 2 years ago

Wow, looks like my old patterns of UQ(sym(col)) and UQS(syms(cols)) are already obsolete. Awesome to watch the tidyverse's evolution towards a best-of-both-worlds interface combining non-standard eval and standard eval. I do hope it's going to stabilize soon though. We've already been through several eras in just a few years.

By the way, are !! and !!! safe to use these days? I remember a few years back, when I first read about them, there was a vague warning that R might mistake them for negation operators in unclear circumstances. That was enough to put me off them and instead use only UQ() and UQS(). NSE already feels risky enough without the potential for my expressions to be completely misunderstood by the interpreter.

lionel- commented 2 years ago

I do hope it's going to stabilize soon though. We've already been through several eras in just a few years.

It's been pretty much stable for a few years now. The new glue and {{ operators were added to tidy eval, they don't replace !! which is still needed for expert usage. Embracing is now recommended for common usage since it lowers the entry bar for programming with tidyverse functions.

On the other hand, the UQ() and UQS() variants from rlang 0.1.0 have been deprecated a while ago. Though given the number of packages that still use these on CRAN, we may not ever remove them.

By the way, are !! and !!! safe to use these days? I remember a few years back, when I first read about them, there was a vague warning that R might mistake them for negation operators in unclear circumstances.

If they are used in the wrong place they'll be treated as negation operators, see https://rlang.r-lib.org/reference/topic-inject-out-of-context.html

NSE already feels risky enough without the potential for my expressions to be completely misunderstood by the interpreter.

In case that helps, see https://rlang.r-lib.org/reference/topic-data-mask-ambiguity.html for how to use .data and .env pronoun and make NSE safer in production applications.

shearerpmm commented 2 years ago

Thanks as always for the incredibly impressive work on rlang. I never would have imagined we would have such a powerful framework for safely using both NSE and SE as appropriate in any possible situation.