tidyverse / tidyeval

A guide to tidy evaluation
https://tidyeval.tidyverse.org
55 stars 21 forks source link

An idea: Introduce quoting as a expression/data transformation from the beginning #10

Open MilesMcBain opened 6 years ago

MilesMcBain commented 6 years ago

I did an initial sketch of this idea in this PR: https://github.com/tidyverse/tidyeval/pull/9

The premise is based purely on my own experience: I found quoting hard to understand until I encountered it explained as a 'code/data transformation'.

I think this idea could be used to add clarity to quoting in this new tidyeval documentation.

Take this first para introducing quoting functions:

On the other hand, a quoting function is not passed the value of an expression, it is passed the expression itself. We say the argument has been automatically quoted. The quoted expression might be evaluated a bit later or might not be evaluated at all. The simplest quoting function is quote(). It automatically quotes its argument and returns the quoted expression without any evaluation. Because only the expression passed as argument matters, none of these statements are equivalent:

It does not define quoting. It also says about quote:

It automatically quotes its argument and returns the quoted expression without any evaluation.

Which I found very confusing until I learned of the 'expert' definition of 'evaluate' it employs - which requires some appreciation of parser + evaluator model.

Using the code/data transform idea:

On the other hand, a quoting function is not passed the value of an expression, it is passed the expression itself as data. We call the process of converting an expression to data quoting. The quoted expression might be evaluated a bit later or might not be evaluated at all. The simplest quoting function is quote(). It automatically quotes its argument and returns the quoted expression as data. Because it not the value of the expression, but its representation as data that matters, none of these statements are equivalent:

It defines quoting explicitly 'the process of converting an expression to data'. It avoids saying 'unevaluated' and in turn avoiding the need for explanation of interpreter internals.

There are a couple of further ideas in the PR. Please accept this as a genuine suggestion based on a frame of explanation I found useful as a conceptual newcomer. A blog post I wrote exploring this idea was received well with many independent positive expressions of feedback. This gives me some confidence in its broader applicability and appeal.

batpigandme commented 6 years ago

Preamble: OK, so I think there's a lot of really good material in here, all of which (both in this PR/Issue, and in describing tidy eval writ large) we (the grand collective of humanity) are trying to break down into palatable nuggets of knowledge (made difficult by the fact that they're often conceptually intertwined).

Thing to keep in the back of our minds: In the event that someone is familiar with the base-R function quote(), there's some serious disambiguation ahead. @edwinth gave a really good talk on base NSE at satRday here, which reminded me of the mental model of "quotation" on "substitution" that presented a bit of an obstacle for me in understanding quasiquotation in LISP and in tidy eval.†

Code/data transformation: Really like this, and I think it's what we're conceptually working towards, and want to build on what initial footing users of other tidyverse packages might have. Discussion of clarifying/building on the differences between what glue does and what tidy eval does has been ongoing, and I think this could be really useful.

From quasiquatation intro in Conrad Barski's Land of Lisp:

“Both the single quote and backquote in Lisp “flip” a piece of code into data mode, but only a backquote can also be unquoted using the comma character, to flip back into code mode.”

00040

In the chapter summary he reduces this to:

Quasiquoting is a technique that allows you to insert small bits of computer code into larger pieces of data.”

This image might not make a ton of sense outside the context of the wizard game you write in the book, but it expressly points to "code mode" and "data mode":

00041

Postamble: I don't think I've actually clarified anything here, but I'm hoping we can ultimately offer/figure out different "levels" of understanding necessary/desirable for different users, but—and I think this is key—that share a common vocabulary, as not to be misleading when one decides to go a bit deeper. ‡

† Also, as far as applicability and appeal go, I think Edwin's tidy eval most-common-actions post has also done well, so 👍👍 for trying to integrate useful extant material—which includes yours, of course, @MilesMcBain! ‡ I don't think anyone's doing anything confusing here, I'm just reiterating it as part of the discussion.

lionel- commented 6 years ago

We call the process of converting an expression to data quoting.

There is no conversion process. It's more of an interruption of the evaluation (computation) process. I just suggested "freezing" as an analogy to quotation.

Which I found very confusing until I learned of the 'expert' definition of 'evaluate' it employs - which requires some appreciation of parser + evaluator model.

I agree this could be improved. I think "computation" is more likely to be intuitive. Quotation is freezing the computation of an expression.

lionel- commented 6 years ago

About code as data, my first inclination would be to mention it in the quotation section of the glossary, I don't know if we want to introduce quotation as such. But it's worth thinking about.

batpigandme commented 6 years ago

About code as data, my first inclination would be to mention it in the quotation section of the glossary.

I think that's totally fine, and/or using this model in the glue-related section since we distinguish glue() and glue_data() already (I'm not sure if that will make it more or less confusing, TBH).

MilesMcBain commented 6 years ago

I'm not sure I have all the context to get the connection with glue. Other than I spose it's a kind of 'dual' of quoting in the message domain?

You made a good start at using this dual to add insight with the "" operator. I think the code/data riff in the PR enhances that, so that's a good sign it's compatible with that type of exposition.

@lionel- The wording can be whatever you like, what I was trying to do was to provide a abstract definition that is true (and useful) at the conceptual level, i.e.

This abstraction may in fact hold on a more technical level. Are the objects that are returned by quote the EXACT same as those used by the internal parser/evaluator? Or have they been dressed up a little to become first class language constructs? - if so this definition holds in the technical sense also.

My take on 'freezing' is that it's an improvement in the sense that it avoids the ambiguity of 'evaluate', however it's still a mystifying definition - literally - in the sense that it raises more questions that it answers:

The reason I am so fond of definitions involving a code/data transform is that they are demystifying. If you're learning to code in R, you are guaranteed to have some comfort with data! You know how to explore it and the types of things that can be done with it. So the questions arising from this type of definition are things the reader can easily answer themselves:

lionel- commented 6 years ago

There is no transformation, not practically nor conceptually. The argument is directly returned, as is, by quote(). A transformation is an action/effect, and when you quote there's an absence of action/effect (unless you unquote something). Evaluation is a transformation but not quotation, which prevents the transformation.

Hadley doesn't like "freezing" either. He suggested the following distinction:

Under quotation, code stops being an action and therefore is just a thing (object) you can do stuff with.

MilesMcBain commented 6 years ago

I think I can see where you are coming from.

You seem to imagine every bit of R code you type as the the tokenised version emitted by the R parser. As if you cruise along thinking "Now i'm making a call", "and this here is a name", "and here is a literal numeric"? So from that perspective sure, the function got what you and the parser passed it - one in the same apparently - and returned it.

Are you, in fact, a machine?

I can guarantee most people do not have the machine's perspective. They actually have no concept of the parser's representation of the code they type. When they pass an expression to quote(some_expression) they imagine some_expression is text. The text they typed.

The output of quote is therefore the text they typed transformed by the parser. Sure the transformation did not happen inside quote, but from their perspective it may as well have. That is the effect of the call.

If you really hate the data transformation abstraction that much, I would say you should go the route of the LISP books and explain the parser + evaluator model, so it can be made clear what is being 'interrupted' and 'unevaluated'.

lionel- commented 6 years ago

I think quotation as absence of transformation lends itself very well to layperson's intuition. Instead of computing the expression that you typed, you get the description of the computation.

MilesMcBain commented 6 years ago

Instead of computing the expression that you typed, you get the description of the computation.

I like this a lot as a description of quote. Book it!

batpigandme commented 6 years ago

I'm not sure I have all the context to get the connection with glue. Other than I spose it's a kind of 'dual' of quoting in the message domain?

It's been pretty easy for people to understand what glue does. That makes it a useful launching point for distinguishing string/data interpolation from code/data switching. That's all.

Are you, in fact, a machine?
What an excellent question, and one that's been at the heart of such rich ontologies, at that! If you haven't read The Enigma by Andrew Hodges, I can't recommend it strongly enough. Turing's childhood letters and diaries aren't in any of the volumes of his papers that I've come across, but Hodges interweaves them beautifully with the narrative of his life.