I wanted to use standard dplyr syntax within a drake workflow plan, but to my (first) surprise, it did not work:
In the global environment, I do this:
library(dplyr)
#> Warning: Paket 'dplyr' wurde unter R Version 3.5.1 erstellt
#>
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
vars_global <- syms(c("hp", "cyl"))
subset <- mtcars %>%
select(!!!vars_global)
When repeating within a drake workflow plan, I get this error:
library(drake)
#> Warning: Paket 'drake' wurde unter R Version 3.5.1 erstellt
drake_plan(
vars_within_plan = syms(c("hp", "cyl")),
subset = mtcars %>%
select(!!!vars_within_plan)
)
#> Error in rlang::exprs(...): Objekt 'vars_within_plan' not found
I then realized that drake uses !! and friends itself to unquote in the calling environment (in this case the global environment) and does not defer evaluation of !!!vars_within_plan to within the workflow plan, which is what I wanted, because I wanted it to behave like dplyr in the global environment, e.g. it should evaluate to hp, cyl. This basically means tidyeval can't be used the same ways within drake::workflow_plan() as it would be used outside, which seems to impose quite a heavy restriction.
I wonder if there is a way to fix this, i.e. using another heuristic to put values from the calling environment into the workflow plan, although this would probably constitute a breaking API change. Can we use `!! !!``? This appears the most natural to me.
In terms of reproducibility, the current usage of !! anyways does not seem to be something that should be encouraged, i.e. I think it's better to define all variables within the workflow data frame. What do you think?
I wanted to use standard dplyr syntax within a drake workflow plan, but to my (first) surprise, it did not work:
In the global environment, I do this:
When repeating within a drake workflow plan, I get this error:
Created on 2018-09-25 by the reprex package (v0.2.1)
I then realized that drake uses
!!
and friends itself to unquote in the calling environment (in this case the global environment) and does not defer evaluation of!!!vars_within_plan
to within the workflow plan, which is what I wanted, because I wanted it to behave like dplyr in the global environment, e.g. it should evaluate tohp, cyl
. This basically means tidyeval can't be used the same ways withindrake::workflow_plan()
as it would be used outside, which seems to impose quite a heavy restriction.I wonder if there is a way to fix this, i.e. using another heuristic to put values from the calling environment into the workflow plan, although this would probably constitute a breaking API change. Can we use `!! !!``? This appears the most natural to me.
In terms of reproducibility, the current usage of
!!
anyways does not seem to be something that should be encouraged, i.e. I think it's better to define all variables within the workflow data frame. What do you think?