ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 129 forks source link

Bang-bang (!!) in drake_plan() causes error during make() #1372

Closed gorgitko closed 3 years ago

gorgitko commented 3 years ago

Prework

Description

Bang-bang (!!) in drake_plan() for more complex or empty objects causes error during make().

Reproducible example

drk_cache1 <- drake::new_cache(tempfile())

cc <- c()
l <- list()
ll <- list(a = 1)
df <- data.frame()

plan1 <- drake::drake_plan(my_c = !!cc, my_l = !!l, my_ll = !!ll, my_df = !!df, my_mtcars = !!mtcars)
plan1
#> # A tibble: 5 x 2
#>   target    command                                                             
#>   <chr>     <expr_lst>                                                          
#> 1 my_c      NULL                                                               …
#> 2 my_l      list()                                                             …
#> 3 my_ll     list(1)                                                            …
#> 4 my_df     list()                                                             …
#> 5 my_mtcars list(c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2,  17.…

# Empty objects.
drake::make(plan1, "my_c", cache = drk_cache1, trigger = drake::trigger(condition = TRUE))
#> ▶ target my_c
#> x fail my_c
#> Error: target my_c failed.
#> diagnose(my_c)$error$message:
#>   argument is missing, with no default
#> diagnose(my_c)$error$calls:
#> 
drake::make(plan1, "my_l", cache = drk_cache1, trigger = drake::trigger(condition = TRUE))
#> ▶ target my_l
#> x fail my_l
#> Error: target my_l failed.
#> diagnose(my_l)$error$message:
#>   argument is missing, with no default
#> diagnose(my_l)$error$calls:
#> 
drake::make(plan1, "my_df", cache = drk_cache1, trigger = drake::trigger(condition = TRUE))
#> ▶ target my_df
#> x fail my_df
#> Error: target my_df failed.
#> diagnose(my_df)$error$message:
#>   argument is missing, with no default
#> diagnose(my_df)$error$calls:
#> 

# Non-empty objects.
drake::make(plan1, "my_ll", cache = drk_cache1, trigger = drake::trigger(condition = TRUE))
#> ▶ target my_ll
#> x fail my_ll
#> Error: target my_ll failed.
#> diagnose(my_ll)$error$message:
#>   not that many frames on the stack
#> diagnose(my_ll)$error$calls:
#> 
drake::make(plan1, "my_mtcars", cache = drk_cache1, trigger = drake::trigger(condition = TRUE))
#> ▶ target my_mtcars
#> x fail my_mtcars
#> Error: target my_mtcars failed.
#> diagnose(my_mtcars)$error$message:
#>   not that many frames on the stack
#> diagnose(my_mtcars)$error$calls:
#> 

# But it works if !! is used for function parameters.
drk_cache2 <- drake::new_cache(tempfile())
plan2 <- drake::drake_plan(
  my_c = identity(!!cc),
  my_l = identity(!!l), my_ll = identity(!!ll),
  my_df = identity(!!df), my_mtcars = identity(!!mtcars)
)
plan2
#> # A tibble: 5 x 2
#>   target    command                                                             
#>   <chr>     <expr_lst>                                                          
#> 1 my_c      identity(NULL)                                                     …
#> 2 my_l      identity(list())                                                   …
#> 3 my_ll     identity(list(1))                                                  …
#> 4 my_df     identity(list())                                                   …
#> 5 my_mtcars identity(list(c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8,  …
drake::make(plan2, cache = drk_cache2, trigger = drake::trigger(condition = TRUE))
#> ▶ target my_ll
#> ▶ target my_c
#> ▶ target my_df
#> ▶ target my_l
#> ▶ target my_mtcars

Created on 2021-07-15 by the reprex package (v2.0.0)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.5 (2021-03-31) #> os Gentoo/Linux #> system x86_64, linux-gnu #> ui X11 #> language en_US.UTF-8 #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Prague #> date 2021-07-15 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> ! package * version date lib source #> P backports 1.2.1 2020-12-09 [?] CRAN (R 4.0.2) #> P base64url 1.4 2018-05-14 [?] CRAN (R 4.0.2) #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.5) #> P digest 0.6.27 2020-10-24 [?] CRAN (R 4.0.2) #> drake * 7.13.2.9000 2021-07-15 [1] Github (ropensci/drake@44dbd38) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5) #> P evaluate 0.14 2019-05-28 [?] CRAN (R 4.0.2) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.5) #> P filelock 1.0.2 2018-10-05 [?] CRAN (R 4.0.2) #> P fs 1.5.0 2020-07-31 [?] CRAN (R 4.0.2) #> P glue 1.4.2 2020-08-27 [?] CRAN (R 4.0.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.5) #> hms 1.1.0 2021-05-17 [1] CRAN (R 4.0.5) #> P htmltools 0.5.1.1 2021-01-22 [?] CRAN (R 4.0.2) #> P igraph 1.2.6 2020-10-06 [?] CRAN (R 4.0.2) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.5) #> P magrittr 2.0.1 2020-11-17 [?] CRAN (R 4.0.2) #> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.0.5) #> P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.0.2) #> P prettyunits 1.1.1 2020-01-24 [?] CRAN (R 4.0.2) #> P progress 1.2.2 2019-05-16 [?] CRAN (R 4.0.2) #> P purrr 0.3.4 2020-04-17 [?] CRAN (R 4.0.2) #> P R6 2.5.0 2020-10-28 [?] CRAN (R 4.0.2) #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.5) #> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.0.5) #> P rstudioapi 0.13 2020-11-12 [?] CRAN (R 4.0.2) #> P sessioninfo 1.1.1 2018-11-05 [?] CRAN (R 4.0.2) #> P storr 1.2.5 2020-12-01 [?] CRAN (R 4.0.2) #> stringi 1.6.2 2021-05-17 [1] CRAN (R 4.0.5) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.5) #> styler 1.5.1 2021-07-13 [1] CRAN (R 4.0.5) #> tibble 3.1.2 2021-05-16 [1] CRAN (R 4.0.5) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5) #> txtq 0.2.4 2021-03-27 [1] CRAN (R 4.0.5) #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.5) #> xfun 0.23 2021-05-15 [1] CRAN (R 4.0.5) #> P yaml 2.2.1 2020-02-01 [?] CRAN (R 4.0.2) #> #> [1] /mnt/foibe3/Users/novotnyj/projects/bioinfocz/scdrake/renv/library/R-4.0/x86_64-pc-linux-gnu #> [2] /tmp/Rtmpc8ELTw/renv-system-library #> [3] /usr/lib64/R/library #> #> P ── Loaded and on-disk path mismatch. ```

Expected result

No errors should arise :slightly_smiling_face:

wlandau commented 3 years ago

Should be fixed in 1cefc2b0004e41d7650b645de68a6337dbf98688. However, please try to avoid !! on complicated objects. It is difficult for R to reconstruct the expression that produces the evaluated object you supply. In earlier versions of R before deparse(control = "niceNames") was implemented, drake_plan(x = !!mtcars) removes the names of the columns of the mtcars dataset.

gorgitko commented 3 years ago

Thank you very much for the hotfix!

To advocate for my needs, I am currently building a drake-workflow-as-package for end-users (slightly inspired by januz/drakepkg), and so a function generating a plan must insert values instead of names (function formals) to the plan. These values are pipeline parameters passed in a list, and this list is also stored as a target for the sake of reproducibility.

EDIT: I will let you know when this package is deployed because I think it could serve as another good example besides januz/drakepkg :slightly_smiling_face: