ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

Challenges in progamming with tidyeval #518

Closed lorenzwalthert closed 5 years ago

lorenzwalthert commented 5 years ago

I wanted to use standard dplyr syntax within a drake workflow plan, but to my (first) surprise, it did not work:

In the global environment, I do this:

library(dplyr)
#> Warning: Paket 'dplyr' wurde unter R Version 3.5.1 erstellt
#> 
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
vars_global <- syms(c("hp", "cyl"))
subset <- mtcars %>%
  select(!!!vars_global)

When repeating within a drake workflow plan, I get this error:

library(drake)
#> Warning: Paket 'drake' wurde unter R Version 3.5.1 erstellt
drake_plan(
  vars_within_plan = syms(c("hp", "cyl")),
  subset = mtcars %>%
    select(!!!vars_within_plan)

)
#> Error in rlang::exprs(...): Objekt 'vars_within_plan' not found

Created on 2018-09-25 by the reprex package (v0.2.1)

I then realized that drake uses !! and friends itself to unquote in the calling environment (in this case the global environment) and does not defer evaluation of !!!vars_within_plan to within the workflow plan, which is what I wanted, because I wanted it to behave like dplyr in the global environment, e.g. it should evaluate to hp, cyl. This basically means tidyeval can't be used the same ways within drake::workflow_plan() as it would be used outside, which seems to impose quite a heavy restriction.

I wonder if there is a way to fix this, i.e. using another heuristic to put values from the calling environment into the workflow plan, although this would probably constitute a breaking API change. Can we use `!! !!``? This appears the most natural to me.

In terms of reproducibility, the current usage of !! anyways does not seem to be something that should be encouraged, i.e. I think it's better to define all variables within the workflow data frame. What do you think?

wlandau commented 5 years ago

drake_plan(tidy_evaluation = FALSE) delays the evaluation of !! and friends until make(). Does that help? Related: #200.

lorenzwalthert commented 5 years ago

Seems exactly what I was looking for. Thanks :-)