Review documentation about quoting

r-lib / rlang

Low-level API for programming with R

https://rlang.r-lib.org

Other

502 stars 140 forks source link

Review documentation about quoting #368

Closed vspinu closed 4 years ago

vspinu commented 6 years ago

Documentation of expr states:

        • ‘expr()’ returns its argument unevaluated. It is equivalent
          to ‘base::bquote()’.

        • ‘enexpr()’ takes an argument name and returns it unevaluated.
          It is equivalent to ‘base::substitute()’.

The "equivalence" is an overstatement. expr is not equivalent to bquote because it uses !! for unquoting instead of . and it doesn't allow for an explicit environment. At the most the doc should say "parallels" and provide further details.

The second parallel is even more confusing. It looks like enexpr is designed to be used only for function arguments inside functions. substitute is more general than that. It evaluates symbols located in the supplied env which enexpr doesn't have.

> local({a <- 1; substitute(a + 1)})
1 + 1
>  local({a <- 1; enexpr(a + 1)})
Error in (function (x, strict = TRUE)  (from utils.R#89) : "x" must be an argument name
>

BTW, what's the rationale for the enquo and enexpr names in the first place> If these are explicitly designed for arguments, wouldn't argquo and argexpr make more sense? Also, the relationship between enexpr and expr is not particularly clear. How would one write enexpr by means of expr alone?

lionel- commented 6 years ago

substitute is more general than that. It evaluates symbols located in the supplied env which enexpr doesn't have.

The comparison to base function is more about the main use case for these functions. I disagree that "equivalent" is not the right term for bquote(). Firefox is equivalent to Chrome even if they don't have the same UI or features. For substitute() we should be more explicit that we are comparing only a subset of features, i.e. substitute(arg) to enexpr(arg). This documentation topic is not the place to teach people about the complicated substitute semantics, it only means to give some rough pointers wrt base function equivalence.

In any case, the documentation has been entirely updated in the dev version, would you mind check it out?

vspinu commented 6 years ago

it only means to give some rough pointers wrt base function equivalence.

I think this does more harm; better none than half-baked and imprecise. A lot of people reading those docs already know substitute, quote, bquote and expression. A brief but exact comparison of base vs rlang quoting would be useful. Similar point for lispers. Rlang quoting should be a breeze for people familiar with lisps, but it's not (reason being imprecise and lose language in the documentation). Might be good to have a half page doc "Rlang for lispers" btw.

A general note on terminology. You seem to use "intercept", "capture", "quote", "quasiquote" interchangeably. It would be good to make it clear that "capturing" in your terminology means "quoting". All quoting mechanisms in rlang provide unquoting so all of them are actually quasiquotes. Then why bother with "quasiquoting" name in the first place and not stick with simple "quote" vs "unquote"? It's an awful name, not adopted by all lisps and R users couldn't care less. To make it worse, rlang docs discuss all quasiquotes in the manual page "quoting" and all the "unquoting" in the manual page "quasiquoting".

Comments on the current "quoting" man page and (hopefully useful) alternative takes on it follow.

Quotation is a mechanism by which an expression supplied as argument is captured by a function.

Two issues. First, quotation need not have anything to do with the function arguments. I can quote as I like within or without a function. Second, "expression" is an objects in R so "expression supplied as arguments" literally means:

e <- expression(a+b)
some_fn(arg=e)

This is what I thought the first time I read this doc.

Instead of seeing the value of the argument, the function sees the recipe (the R code) to make that value.

"value of an argument"? recipe? R-code? All these are used in some specific sense which hasn't been defined and it's likely not the understanding that most people have. I infer that "argument" here means unevaluated expression associated with the argument's symbol and "value" means result of evaluation of that expression. My guess is that most people when they say 'argument' refer to the value of evaluation. Some folks might not even be aware that there is a 2 step process involved in argument evaluation and you can have access to both of those from within a function. R-code could mean a bunch of things; for me it's the plain text before it's even parsed. Recipe is something else completely.

Symbols represent the name that is given to an object in a particular context (an environment).

This is slightly misleading as symbols are not linked to environments. Same symbol can be resolved to different objects in different environments.

I am pretty sure it's possible to cleanly define all concepts and introduce the reader to read-eval loop within the same Description. Maybe something along the following lines:

Under normal circumstance R interpreter parses R code (raw text) into data
structures (expressions) and then immediately evaluates those
expressions. This process is commonly called "read-eval loop". Quotation
is a mechanism to prevent the evaluation of the parsed expressions.

R expressions are composed of calls and symbols:

  * Symbols represent names of objects. Evaluating a symbol results in that
  symbol being looked up in the enclosed environment.

  * Calls represent the action of calling a function with a set of
  arguments. Both call and arguments' names are symbols, but the arguments
  are arbitrary R expressions. Evaluating a call results in computation of a
  value as defined by the function. Unlike in most languages, evaluating a
  call in R doesn't involve immediate evaluation of its
  arguments. Evaluation of the arguments can be controlled from the inside
  of the function. The expression passed as arguments can be retrieved from
  within the function by an appropriate quotation.

There are two ways to create R expressions. By *building* calls and symbols
from parts and pieces (see ‘sym()’, ‘syms()’ and ‘call2()’). Or, by
intercepting (aka quoting or capturing) an expression instead of evaluating
it with _quotation_ .

User expressions versus your expressions:

This part is a bit allegorical. How about "Expressions from enclosed environments vs inclined expression."? I just guessed that en stands for "enclosure".

The combination of quotation and unquotation is called quasiquotation

This part is confusing. Combination in what sense? I can use quasiquote without unquote just fine; quasiquote allows for unquoting but does not require it. And, as already said, there are no pure quotes in rlang so why bother with the distinction? Just call everything quoting and sell it as a super-base::quote and be done with it; no extra term - no confusion.

Unquotation provides a way to refer to variables during quotation. Variables are problematic when quoting because a captured

"Way to refer" by whom, user of the quotation or user of the function? What is meant by variable? Why they are "problematic"? This doc seem to have in mind a very specific use case of data.frame variable capturing. If I capture an expression and then evaluate it immediately "problems" don't occur, right? An alternative take might be:

Unquoting provides an escape from quoting of parts of the expression. When
you quote an expression some parts are still evaluated if marked as
unquotable (see [unquoting]).

lionel- commented 6 years ago

I think this does more harm; better none than half-baked and imprecise. A lot of people reading those docs already know substitute, quote, bquote and expression.

I disagree. Besides, more than 95% of people don't fully understand the semantics of substitute() or what expression vectors are about.

Might be good to have a half page doc "Rlang for lispers" btw.

Ideally yes but this kind of doc takes time to write and we're quite time-constrained already. It would also be good to write a paper on quosures and lexical scoping with fexprs.

It would be good to make it clear that "capturing" in your terminology means "quoting"

Capturing is about quoting the expression of your user (your function becomes quoting for the user) while quoting is about the expression that you supply (you are using a quoting function).

Then why bother with "quasiquoting" name in the first place and not stick with simple "quote" vs "unquote"? It's an awful name, not adopted by all lisps and R users couldn't care less.

I think it's a fine name and it does underline the difference with other quoting functions that don't allow unquoting. I prefer quasiquoting than backquoting as it makes more sense to me. This terminology is used in the next edition of adv-r and is unlikely to change.

First, quotation need not have anything to do with the function arguments. I can quote as I like within or without a function.

I think this is wrong. Quotation in R is linked to arguments and in fact to lazy evaluation of arguments.

Second, "expression" is an objects in R

We are deliberately ignoring expression vectors.

"value of an argument"? recipe? R-code? All these are used in some specific sense which hasn't been defined and it's likely not the understanding that most people have

After seeing some discussions on Community I'm fairly certain that this framing makes sense to R users. The doc surely could be improved but I don't think removing all colloquialisms is the right direction.

This is slightly misleading as symbols are not linked to environments.

A symbol only means something (has a value) in a particular lexical context. In elisp a symbol can mean something on its own but in elisp symbols are namespaced, can have attributes, etc. In R the meaning of symbols is almost entirely given by the environment in which you evaluate it (possibly a data mask).

"Way to refer" by whom, user of the quotation or user of the function? What is meant by variable?

It's about constants vs variables, which is the framing we have chosen to teach about quotation and quasiquotation. I like your alternative docs but we are trying to make the documentation as meaningful as possible to casual users, as opposed to technical users. Writing this kind of doc is hard and we will continue to improve them as we gain experience teaching about tidy eval to a large audience.

vspinu commented 6 years ago

I think this does more harm; better none than half-baked and imprecise. A lot of people reading those docs already know substitute, quote, bquote and expression.

more than 95% of people don't fully understand the semantics of substitute()

This is not a good reason to be sloppy with your terminology. Those 95% will never get clarified if you keep writing man pages for the causal user. Vignettes are fine, but man pages should be technical.

Might be good to have a half page doc "Rlang for lispers" btw.

Ideally yes but this kind of doc takes time

It's half a page and it's close to trivial in my view. I would be willing to contribute one if only you would not disagree on every single point I say :(

It would be good to make it clear that "capturing" in your terminology means "quoting"

Capturing is about quoting the expression of your user

Ok. This is interesting distinction. Please add it to the docs then.

So, are we on the same page that enquo is capturing and quo is quoting? And both are quasiquotes?

I think it's a fine name and it does underline the difference with other quoting functions that don't allow unquoting.

There are no "other" functions in rlang and from base R only quote doesn't allow for some form of unquoting. Both bquote and substitute do unquoting. I really think if you stick with quoting and simply discuss various forms of unquoting you will end up with a much simpler documentation.

The extra distinction is of very little practical importance and it's not worth the confusion. Most lisps would be just fine with backquote alone.

I prefer quasiquoting than backquoting as it makes more sense to me.

The current docs don't have a clear definition of what quasiquote means. In lisps quote, backquote, quasiquote etc. are special forms not some vague "combination of quoting and unquoting".

There is little intrinsic difficulty with quoting and unquoting, it's the inadequate docs that make it unclear for most people.

This terminology is used in the next edition of adv-r and is unlikely to change.

FYI, what you call quasiquoting is actually more closely related to "syntax-quoting". Racket and Clojure both have forms of it. Syntax quoting preserves some lexical information and this is what you do with quo in R.

It would be great if you guys could settle on common and minimal terminology before putting it in papers and books. @hadley's books are hugely influential. Once the confusing terminology started spreading there will be no end to it.

First, quotation need not have anything to do with the function arguments. I can quote as I like within or without a function.

I think this is wrong. Quotation in R is linked to arguments and in fact to lazy evaluation of arguments.

I hope you don't mean this seriously and don't just disagree for the sake of disagreeing. I use quoting and unquoting all the time to construct expressions, functions and calls which have nothing to do with the argument substitution.

Inside functions unquoting works the way it works because of the lazy evaluation of arguments. It's almost a side effect, and selling it as the only use case of unquoting is unacceptable. It might be the most common use case, but it certainly doesn't lie at the core of the concept.

Second, "expression" is an objects in R

We are deliberately ignoring expression vectors.

You do ignore them, but your reader knows about those and doesn't know that you ignore them. You are missing my main point. The docs operate with terms in some specific (your) sense which are not defined a-priori. My main suggestion regarding this documentation is to define all terms properly before using them.

I don't think being technical and rigorous automatically means "unappealing to casual users". You can try current version vs the version which I suggested and check which one "appeals" better for some casual R programmer.

The doc surely could be improved but I don't think removing all colloquialisms is the right direction.

I suggested removing undefined terms. You can use colloquialisms but should define them first. Those "colloquialisms" have different meaning for people with different backgrounds.

This is slightly misleading as symbols are not linked to environments.

A symbol only means something (has a value) in a particular lexical context.

This is exactly what I said, and it's the reason why the current definition of the symbol is incorrect or at least confusing.

"Way to refer" by whom, user of the quotation or user of the function? What is meant by variable?

I like your alternative docs but we are trying to make the documentation as meaningful as possible to casual users, as opposed to technical users.

Blogs, books and vignettes are for casual users. Man pages should be clear, technical and with as little ambiguity as possible.

vspinu commented 6 years ago

There are no "other" functions in rlang and from base R only quote doesn't allow for some form of unquoting. Both bquote and substitute do unquoting.

To clarify what I meant there:


local({a <- 1; quote(a + a + b)})      ## no unquoting supported
## a + a + b
local({a <- 1; bquote(.(a) + a + b)})  ## explict unquoting
## 1 + a + b
local({a <- 1; substitute(a + a + b)}) ## implicit unquoting of symbols found in enclosed environment
## 1 + 1 + b

local({a <- 1; expr(a + a + b)})       ## no unquoting
## a + a + b
local({a <- 1; expr(!!a + a + b)})     ## explicit unquoting
## 1 + a + b

local({a <- 1; quo(a + a + b)})        ## same as expr but keeps the environment
## <quosure>
##   expr: ^a + a + b
##   env:  0x9f13c90
local({a <- 1; quo(!!a + a + b)})
## <quosure>
##   expr: ^1 + a + b
##   env:  0x84a4898

For someone who is familiar with base quoting mechanisms the above example should be sufficient to understand what rlang quoting does.

lionel- commented 6 years ago

This is not a good reason to be sloppy with your terminology.

It's no longer sloppy, enexpr(arg) is equivalent to substitute(arg): https://github.com/tidyverse/rlang/commit/34a7ac7e. Note that since rlang 0.2.0, enexpr() and enquo() capture forced arguments or evaluated objects just like substitute() does.

Vignettes are fine, but man pages should be technical.

That's the approach taken by base R doc but I think we want to be less technical in rlang (which does not mean we don't cover technical subjects, it's only about the way we cover it).

It's half a page and it's close to trivial in my view.

It's not trivial because it touches upon questions like macros vs fexprs and lexical hygiene.

Please add it to the docs then.

There's a whole section in ?quotation.

There are no "other" functions in rlang and from base R only quote doesn't allow for some form of unquoting.

~, match.call() and sys.call() don't perform unquoting. quote() is the fundamental quoting function in R and ~ the most well known by R users. Then you have to add all fexprs created with substitute() which are not quasiquoting functions. With substitute() only the developer can unquote (for some definition of unquoting), the user can't.

I think quasiquotation is a better term for comparison with non quasiquoting functions as well as for comparison with other languages where unquoting is the exception rather than the rule. As a user of a lisp macro you can't unquote anything, only the developer can (with the backquote). Note that user-unquoting would be no use in a macro-based language, it is only useful when you evaluate in a masking environment.

In any case that's the terminology used in the next edition of Hadley's book so changing it is unlikely without very good reasons.

Most lisps would be just fine with backquote alone.

I think the extra syntax helps the reader of lisp code: backquote means that something is unquoted. This help is generally not available in R because there's currently no great way to distinguish fexprs from functions, let alone quasiquoting fexprs. For this reason I'm kind of torn about whether to recommend users to use quote() instead of expr() when there's no unquoting.

Syntax quoting preserves some lexical information and this is what you do with quo in R.

I think this is just an implementation detail of hygienic macros, from the user point of view quasiquotation seems like the relevant term.

you guys could settle on common and minimal terminology before putting it in papers and books

Perhaps it doesn't show but we are spending a huge amount of time discussing terminology with Hadley.

I use quoting and unquoting all the time to construct expressions, functions and calls which have nothing to do with the argument substitution.

Just look at the R definitions of expr() and quo() or the C definition of quote(), substitute() and ~. Quoting in R is about arguments of function calls in a deep sense (pairlist nodes of actual arguments (as opposed to formal)). In addition match.call() and sys.call() also capture the CAR of the function call.

but your reader knows about those

Singular reader does seem appropriate ;) We are still figuring out the type nomenclature to be used in rlang. Cf recent switch from lang to call particle. type_of() is to be considered experimental and likely to change in the future. We are using the term "expression" for something else than the base R type and we're unlikely to change it at this point.

I don't know why there are expression vectors in R since they are not useful for anything right now. I'd be interested in some archeological digging to learn more about them.

Blogs, books and vignettes are for casual users. Man pages should be clear, technical and with as little ambiguity as possible.

I agree with clear and non-ambiguous. I'll take into account your suggestions at the next pass on the quotation doc.

vspinu commented 6 years ago

It's no longer sloppy, enexpr(arg) is equivalent to substitute(arg): 34a7ac7.

Small change can make a difference ;)

Note that since rlang 0.2.0, enexpr() and enquo() capture forced arguments or evaluated objects just like substitute() does.

Not the same but better. I assume 0.1.6.9003 is 0.2.0.

library(rlang)
tt <- function(a = 1 + 2, b = 3 + 4, c = 5 + 6) {
    force(b)
    c <- c
    list(substitute = substitute(a + b + c),
         enexpr = c(enexpr(a), enexpr(b), enexpr(c)))
}
tt()
#> $substitute
#> 1 + 2 + (3 + 4) + 11
#> 
#> $enexpr
#> $enexpr[[1]]
#> 1 + 2
#> 
#> $enexpr[[2]]
#> [1] 7
#> 
#> $enexpr[[3]]
#> [1] 11

Examples like this strike me (again) that enexpr is a sub-optimal name. It suggests similarity to expr but semantics is very different (accepts one symbol; substitute its argument). Why then not go into full substitute and allow enexpr(a+b)? Or, why not call it argexpr to suggest that it's the expression of the argument that is expanded and returned? And yet again, what does this "en" stand for?

It's not trivial because it touches upon questions like macros vs fexprs and lexical hygiene.

It depends how much detail you want to provide of course. I don't see why lexical hygiene should be covered, nor I think detailing too much on fexprs is useful because there are only a handful of (old?) lisps supporting fexprs. My personal difficulty with the docs was that I couldn't clearly see how the plethora or rlang names map to the simple lisp quoting, backquoting and unquoting.

There's a whole section in ?quotation.

"capturing" is never defined, nor the doc ever mentions the distinction (that emerged on this thread) between "capture" and "quoting". The doc uses "capture", "quote" and "intercept" interchangeably. For instance you have "names of arguments to capture without evaluation" and "For ‘exprs()’ and ‘quos()’, the expressions to capture unevaluated" in the description of "...". And then "You can capture the expressions that you supply."

~,

Yerh, I missed the elephant in the room :/

match.call() and sys.call() don't perform unquoting.

Sure, if stretch the meaning of unquoting to "second-order" unquoting. How is this "second-order" unquoting called btw?

I think quasiquotation is a better term for comparison with non quasiquoting functions as well as for comparison with other languages where unquoting is the exception rather than the rule.

I see. Fair enough.

As a user of a lisp macro you can't unquote anything.

You would need to write another macro and it's usually a pain :(

Note that user-unquoting would be no use in a macro-based language, it is only useful when you evaluate in a masking environment.

Not sure what you mean here. There are plenty of uses in lisp when you would like to have programmatic control of what symbols/expression you pass as macro arguments.

there's currently no great way to distinguish fexprs from functions, let alone quasiquoting fexprs.

You use fexprs in a specific sense, which your reader has to infer from the context ;).To me all R functions are fexprs within which the programmer is free to choose whether to (un)evaluate arguments or not.

For this reason I'm kind of torn about whether to recommend users to use quote() instead of expr() when there's no unquoting.

For this reason "qquote" might be a better name than "expr" :P. People would automatically use quote when qquote is not needed. It would also not create unnecessary parallel between expr and expression.

I use quoting and unquoting all the time to construct expressions, functions and calls which have nothing to do with the argument substitution.

Quoting in R is about arguments of function calls in a deep sense (pairlist nodes of actual arguments (as opposed to formal)).

Sure, if you mean the arguments of the calls within the quoted expression (everything interesting is a call after all). I was under the impression that we are discussing arguments of the outer caller function, as per very first sentence in the docs "Quotation is a mechanism by which an expression supplied as argument is captured by a function. ".

Cf recent switch from lang to call particle.

"lang" was damn annoying. I almost opened an issue for that :) "call2" is not great though, but well, one has to make tradeoffs. I guess you have considered new_call or make_call already.

moodymudskipper commented 4 years ago

Might be relevant to this discussion and I'm not sure if it's a bug or a documentation issue, but enexpr() and substitute() don't behave the same in S3 methods, see example below :

foo <- function(x){
  print(substitute(x))
  print(rlang::enexpr(x))
  UseMethod("foo")
}

foo.default <- function(x){
  print(substitute(x))
  print(rlang::enexpr(x))
  invisible()
}

bar <- "baz"
foo(bar)
#> bar
#> bar
#> bar
#> [1] "baz"

^{Created on 2020-01-08 by the reprex package (v0.3.0)}

Edit: reported in separate issue: https://github.com/r-lib/rlang/issues/884

vspinu commented 4 years ago

@moodymudskipper this is probably a bug and unrelated to this issue. Please open a new issue.

Regarding the docs, I see that there is a new "invention" in the docs:defusion. Looks like you guys really derive extra utility from using non-standard names for established concepts. Is really defused argument any more clearer than the standard quoted argument?

lionel- commented 4 years ago

@vspinu Thanks for the astringent comments. Yes, we are exploring new vocabulary to help casual programmers understand NSE. It is extremely awkward to explain quotation to beginners.

vspinu commented 4 years ago

Yes, we are exploring new vocabulary to help casual programmers understand NSE. It is extremely awkward to explain quotation to beginners.

I don't see how a variety of freshly invented synonyms help with this task. Beginner don't stay beginners, but communication becomes harder and harder because people start using different names for the same thing.