I understand this error is one of the most common ones when using tidyverse inside a function. But I'm unsure I really understand it. Sorry I used GPT to make the writing clearer (not necessarily correct).
Title: Clarification Required: Tidy Evaluation's Need for Explicit "Linkage" Between Expression and Data Mask
Background: The confusion arises from the behavior of eval_tidy when evaluating an expression in the context of a provided data mask (e.g., a dataframe). Why is there a need for an explicit "linkage" between the expression and the data mask, even when the dataframe is provided directly as an argument?
df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)
subsample(df, x == 1)
The error thrown is: Error in eval_tidy(rows, data): object 'x' not found.
Inferred Understanding from the Error:
The error message indicates that the rows argument in the subset2 function does capture the expression x == 1.
Inside subset2, the rows value is directly associated with the expression x == 1 and its environment, rather than being an abstract placeholder.
Primary Concern:
Why does eval_tidy, inside subset2, require a quosure (with embedded environment information) to evaluate the rows expression correctly, even when the dataframe is provided directly?
Would traditional eval exhibit similar behavior?
Explanation:
Lazy Evaluation in R: R employs lazy evaluation for function arguments. When subsample(df, x == 1) is called, the expression x == 1 isn't evaluated right away. Instead, it is evaluated when cond is referenced within the function.
Execution Inside subset2 and Role of Data in Quosure's Environment: The rows_val <- eval_tidy(rows, data) line is where cond (sent as rows) is actually evaluated.
Although the data frame (data) is provided to eval_tidy, it doesn't set the evaluation environment for rows by itself. Instead, eval_tidy references the environment contained within the quosure. The crucial insight here is that for the evaluation to be successful, the data (in this case, the dataframe) needs to be available within the environment of the quosure.
The quosure captures both the expression and its associated environment. This is designed to ensure that the expression can be evaluated in the right context. The data mask is expected to be part of this environment or context. When eval_tidy evaluates a quosure, it merges the data mask with the quosure's environment. Symbols in the expression are first looked up in the data mask, then in the quosure's environment, and then in parent environments. The absence of this linkage between the data and the quosure's environment causes the evaluation error.
Role of enquo & Immediate Unquoting: The enquo function captures an expression and its surrounding environment into a quosure. This allows for a connection or "linkage" between the expression and its context. In our subsample function, the immediate unquoting !! pulls down the cond expression directly into the function, ensuring that the expression and its context are evaluated together. This is crucial for the subsequent subset2 function to interpret and evaluate the expression in the right context with eval_tidy.
Comparison with Traditional eval: Using traditional eval, the expression would be evaluated directly within the given environment. This approach doesn't rely on the embedded environment within a quosure, and thus, the behavior may differ from eval_tidy.
I understand this error is one of the most common ones when using tidyverse inside a function. But I'm unsure I really understand it. Sorry I used GPT to make the writing clearer (not necessarily correct).
Title: Clarification Required: Tidy Evaluation's Need for Explicit "Linkage" Between Expression and Data Mask
Background: The confusion arises from the behavior of
eval_tidy
when evaluating an expression in the context of a provided data mask (e.g., a dataframe). Why is there a need for an explicit "linkage" between the expression and the data mask, even when the dataframe is provided directly as an argument?Scenario:
Given the functions:
When calling:
The error thrown is:
Error in eval_tidy(rows, data): object 'x' not found
.Inferred Understanding from the Error:
rows
argument in thesubset2
function does capture the expressionx == 1
.subset2
, therows
value is directly associated with the expressionx == 1
and its environment, rather than being an abstract placeholder.Primary Concern:
eval_tidy
, insidesubset2
, require a quosure (with embedded environment information) to evaluate therows
expression correctly, even when the dataframe is provided directly?eval
exhibit similar behavior?Explanation:
Lazy Evaluation in R: R employs lazy evaluation for function arguments. When
subsample(df, x == 1)
is called, the expressionx == 1
isn't evaluated right away. Instead, it is evaluated whencond
is referenced within the function.Execution Inside
subset2
and Role of Data in Quosure's Environment: Therows_val <- eval_tidy(rows, data)
line is wherecond
(sent asrows
) is actually evaluated.Although the data frame (
data
) is provided toeval_tidy
, it doesn't set the evaluation environment forrows
by itself. Instead,eval_tidy
references the environment contained within the quosure. The crucial insight here is that for the evaluation to be successful, the data (in this case, the dataframe) needs to be available within the environment of the quosure.The quosure captures both the expression and its associated environment. This is designed to ensure that the expression can be evaluated in the right context. The data mask is expected to be part of this environment or context. When
eval_tidy
evaluates a quosure, it merges the data mask with the quosure's environment. Symbols in the expression are first looked up in the data mask, then in the quosure's environment, and then in parent environments. The absence of this linkage between the data and the quosure's environment causes the evaluation error.Role of
enquo
& Immediate Unquoting: Theenquo
function captures an expression and its surrounding environment into a quosure. This allows for a connection or "linkage" between the expression and its context. In oursubsample
function, the immediate unquoting!!
pulls down thecond
expression directly into the function, ensuring that the expression and its context are evaluated together. This is crucial for the subsequentsubset2
function to interpret and evaluate the expression in the right context witheval_tidy
.Comparison with Traditional
eval
: Using traditionaleval
, the expression would be evaluated directly within the given environment. This approach doesn't rely on the embedded environment within a quosure, and thus, the behavior may differ fromeval_tidy
.