ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

How to deal with dynamic length changing #1362

Closed jennysjaarda closed 3 years ago

jennysjaarda commented 3 years ago

Prework

Question

I'm having an issue with dynamic lengths that change as the plan evolves. For example, in the code below (it's not the most elegant but it shows the issue I'm having), I create a target continent_names in the form of a tibble, which has 5 rows. I then map over this target as expected in the models target. The result of models is a tibble with 60 rows. I then want to map over this tibble in models_test, i.e. 60 subtargets should run, but instead 5 run as a legacy of the number of row of the target models. Am I missing something? What is the best way to proceed?

Reproducible example


library(broom)
library(drake)
library(gapminder)
library(tidyverse)

make_continent_models <- function(gapminder, continent, year_window){

  gapminder_sub <- gapminder %>% filter(continent==continent)
  min_positon <- min(gapminder_sub$year)
  max_position <- max(gapminder_sub$year)

  chunk_starts <- seq(min_positon, max_position, year_window)
  out <- numeric()

  for(i in 1:length(chunk_starts)){
    chunk_num <- i
    start <- chunk_starts[i]
    if(i!=length(chunk_starts)){
      end <- chunk_starts[i+1]-1
    }
    if(i==length(chunk_starts)){
      end <- max_position
    }
    out_i <- cbind(continent, start, end)
    out <- rbind(out, out_i)
  }

  out <- as_tibble(out) %>% mutate_all(as.character)

}

test_models <- function(continent, start_pos, end_pos){
  cat(paste0("Running continent: ", continent, ".\n"))

  cat(paste0("Starting at year: ", start_pos, ".\n"))

  cat(paste0("Ending at year: ", end_pos, ".\n"))

}

plan <- drake_plan(
  continent_names = tibble(continent= as.character(unique(gapminder$continent))),
  models = target(make_continent_models(gapminder, continent_names$continent, year_window=5), 
                  dynamic = map(continent_names)),
  models_test = target({
    test_models(models$continent, models$start, models$end)
  }, dynamic = map(models)),
)
wlandau commented 3 years ago

models_test maps over the dynamic sub-targets of models, and there are 5 of those. To map over the rows of models, you can create a new target that binds all the sub-targets together first. Sketch:

library(broom)
library(drake)
library(gapminder)
library(tidyverse)

make_continent_models <- function(gapminder, continent, year_window){

  gapminder_sub <- gapminder %>% filter(continent==continent)
  min_positon <- min(gapminder_sub$year)
  max_position <- max(gapminder_sub$year)

  chunk_starts <- seq(min_positon, max_position, year_window)
  out <- numeric()

  for(i in 1:length(chunk_starts)){
    chunk_num <- i
    start <- chunk_starts[i]
    if(i!=length(chunk_starts)){
      end <- chunk_starts[i+1]-1
    }
    if(i==length(chunk_starts)){
      end <- max_position
    }
    out_i <- cbind(continent, start, end)
    out <- rbind(out, out_i)
  }

  out <- as_tibble(out) %>% mutate_all(as.character)

}

test_models <- function(continent, start_pos, end_pos){
  data.frame(continent = unique(continent), number_of_rows = length(continent))
}

plan <- drake_plan(
  continent_names = tibble(continent= as.character(unique(gapminder$continent))),
  models = target(make_continent_models(gapminder, continent_names$continent, year_window=5), 
                  dynamic = map(continent_names)),
  models_agg = models, # Binds the sub-targets together by rows.
  models_test = target({
    test_models(models_agg$continent, models_agg$start, models_agg$end)
  }, dynamic = map(models_agg)),
)

make(plan, verbose = 0)

readd(models_test)
#>    continent number_of_rows
#> 1       Asia              1
#> 2       Asia              1
#> 3       Asia              1
#> 4       Asia              1
#> 5       Asia              1
#> 6       Asia              1
#> 7       Asia              1
#> 8       Asia              1
#> 9       Asia              1
#> 10      Asia              1
#> 11      Asia              1
#> 12      Asia              1
#> 13    Europe              1
#> 14    Europe              1
#> 15    Europe              1
#> 16    Europe              1
#> 17    Europe              1
#> 18    Europe              1
#> 19    Europe              1
#> 20    Europe              1
#> 21    Europe              1
#> 22    Europe              1
#> 23    Europe              1
#> 24    Europe              1
#> 25    Africa              1
#> 26    Africa              1
#> 27    Africa              1
#> 28    Africa              1
#> 29    Africa              1
#> 30    Africa              1
#> 31    Africa              1
#> 32    Africa              1
#> 33    Africa              1
#> 34    Africa              1
#> 35    Africa              1
#> 36    Africa              1
#> 37  Americas              1
#> 38  Americas              1
#> 39  Americas              1
#> 40  Americas              1
#> 41  Americas              1
#> 42  Americas              1
#> 43  Americas              1
#> 44  Americas              1
#> 45  Americas              1
#> 46  Americas              1
#> 47  Americas              1
#> 48  Americas              1
#> 49   Oceania              1
#> 50   Oceania              1
#> 51   Oceania              1
#> 52   Oceania              1
#> 53   Oceania              1
#> 54   Oceania              1
#> 55   Oceania              1
#> 56   Oceania              1
#> 57   Oceania              1
#> 58   Oceania              1
#> 59   Oceania              1
#> 60   Oceania              1

Created on 2021-03-22 by the reprex package (v1.0.0)

jennysjaarda commented 3 years ago

Thanks a lot! I have one more question but I'll post a new thread.