ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

Use the .by argument of combine in command #1340

Closed vkehayas closed 3 years ago

vkehayas commented 3 years ago

Prework

Question

I would like to use a grouping variable inside the command that builds a target. Using drake's combine returns an object that cannot be used directly.

Reproducible example

library(drake)
inputFrame = expand.grid(id = c("a", "b"),
                        n1 = c(10, 100))

spamPlan = drake_plan(
  x = target(runif(n1),
             transform = map(n1,
                             .data = !!inputFrame)),
  y = target(x*100,
              transform = map(x)),
  z = target(data.frame(s = sum(y),
                        id = id), # This is the crucial step that fails
             transform = combine(y,
                                 id,
                                 .by = id))
)

> spamPlan
# A tibble: 10 x 2
   target    command                                             
   <chr>     <expr>                                              
 1 x_a_10    runif(10)                                           
 2 x_b_10    runif(10)                                           
 3 x_a_100   runif(100)                                          
 4 x_b_100   runif(100)                                          
 5 y_x_a_10  x_a_10 * 100                                        
 6 y_x_b_10  x_b_10 * 100                                        
 7 y_x_a_100 x_a_100 * 100                                       
 8 y_x_b_100 x_b_100 * 100                                       
 9 z_a       data.frame(s = sum(y_x_a_10, y_x_a_100), id = `"a"`)
10 z_b       data.frame(s = sum(y_x_b_10, y_x_b_100), id = `"b"`)

> make(spamPlan)
▶ target x_b_10
▶ target x_a_10
▶ target x_b_100
▶ target x_a_100
▶ target y_x_b_10
▶ target y_x_a_10
▶ target y_x_b_100
▶ target y_x_a_100
▶ target z_b
x fail z_b
Error: target z_b failed.
diagnose(z_b)error$message:
  object '"b"' not found
diagnose(z_b)error$calls:
  1. └─base::data.frame(s = sum(y_x_b_10, y_x_b_100), id = `"b"`)

I am not sure why this happens but the problem is solved if I use paste(quote(id)) like in this plan:

spamPlan = drake_plan(
  x = target(runif(n1),
             transform = map(n1,
                             .data = !!inputFrame)),
  y = target(x*100,
              transform = map(x)),
  z = target(data.frame(s = sum(y),
                        id = as.character(quote(id))), 
             transform = combine(y,
                                 id,
                                 .by = id))
)

Is it worth considering to perform such a transformation to the appropriate class within drake? The variable id is part of the plan and I would expect that the variable should be resolved.

wlandau commented 3 years ago

Thanks, should be fixed now.