What is best practice for using drake with params in an .Rmd?

ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

GNU General Public License v3.0

1.34k stars 128 forks source link

Background

If I wanted to produce a number of reports from one template, without drake I would do something like:

# Create a vector of the elements to iterate over
species <- c("Adelie","Chinstrap","Gentoo")

# Render to HTML the template for each param
purrr::map(
  .x = species,  # vector of param values
  .f = ~render(
    input = "doc/template.Rmd",  # RMarkdown filepath
    params = list(name = .x),  # iterated parameter value
    output_file = paste0("doc/", .x, ".html")  # iterated output path
    )
  )
)

Using static branching in drake I can get something similar, where each report target is a separate row in the plan, but it feels a little hacky making the paths first (since file_out() and friends only take strings), and in the plan in the below example the target 'files' isn't connected to anything. Also, is it correct to have each report as a separate target, or should it just be one target since all reports will need to be updated when the data is updated anyway? In reality I'm going to have >150 rmarkdown reports and potentially other outputs like slides decks, hence why in doc/ I'd like to keep just the 'templates', and then reports/ will have the rendered reports or slides.

Example

This is my hack using static branching:

library(palmerpenguins)

plan <- drake_plan(

penguin_data = penguins %>% group_by(species) %>% 
  summarise_if(is.numeric,list(min, max)) %>% mutate_at(vars(species),as.character),

files = data.frame(species = penguin_data$species, 
                   path = paste0("report/report_", penguin_data$species, ".html")),

report = target(
  render(input = knitr_in(input), output_file = file_out(output), 
         params = list(species = p)),
  transform = map(
    input="doc/template.Rmd",
    output=!!files$path,
    p = !!files$species,
    .names = paste0("report_", !!files$species)
  )
)

)

I have a drake project set up like this:

_drake.R
report.Rmd
packages.R
R/
├── functions.R
└── plan.R
doc/
└── template.Rmd
report/
├── report_gentoo.Rmd
├── report_chinstrap.Rmd
└── report_adelie.Rmd
data/
└──file1.csv

plan <- drake_plan( penguin_data = penguins %>% group_by(species) %>% summarise_if(is.numeric,list(min, max)) %>% mutate_at(vars(species),as.character), files = data.frame( species = penguin_data$species, path = paste0("report/report_", penguin_data$species, ".html") ), report = target({ render( input = knitr_in("doc/template.Rmd"), output_file = files$path, params = list(species = files$species) ) # Just underscoring here that the output path should be returned for format = "file". # rmarkdown::render() does that anyway. files$path }, format = "file", # Track the returned output file path. dynamic = map(files) # Maps over the rows and makes one sub-target per row. ) )

ropensci / drake

What is best practice for using drake with params in an .Rmd? #1348

Background

Example