Closed diazrenata closed 5 years ago
Can you check your version of Drake? I think version 6.2.1 might be required for the new implementation of mapping methods to datasets.
hm - that's the one I have?
Here's what I see for running the front half of pipeline.R
... can you see what might be different?
library(MATSS)
#> Please look at our data formats by running `vignette("data-formats")`
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(drake)
## Read in the maizuru community data from a csv file
get_maizuru_data <- function()
{
data_path <- system.file("extdata", "Maizuru_dominant_sp.csv",
package = "MATSS", mustWork = TRUE)
raw_data <- read.csv(data_path)
list(abundance = dplyr::select(raw_data, -date_tag, -surf.t, -bot.t, -Y, -M, -D) %>%
mutate_all(~round(. + 1e-10)),
covariates = dplyr::select(raw_data, date_tag, surf.t, bot.t, Y, M, D))
}
## Get raw data
datasets_raw <- drake_plan(
bbs_data_tables = rdataretriever::fetch("breed-bird-survey"),
sdl_data_tables = rdataretriever::fetch("veg-plots-sdl"),
mtquad_data_tables = rdataretriever::fetch("mapped-plant-quads-mt")
)
## Clean and transform the data into the appropriate format
datasets <- drake_plan(
portal_data = get_portal_rodents(),
maizuru_data = get_maizuru_data(),
jornada_data = process_jornada_data(),
sgs_data = process_sgs_data(),
bbs_data = get_bbs_data(bbs_data_tables, region = 7),
sdl_data = get_sdl_data(sdl_data_tables),
mtquad_data = get_mtquad_data(mtquad_data_tables),
bad_portal = portal_data[[1]]
)
## Analysis methods
methods <- drake_plan(
lda = function(dataset) {run_LDA(dataset, max_topics = 6, nseeds = 20)}
)
## Define how results are collected
collect <- function(list_of_results, plan)
{
names(list_of_results) <- all.vars(match.call()$list_of_results)
list_of_results
}
## The combination of each method x dataset
analyses <- drake_plan(
# expand out each `fun(data)``, where
# `fun` is each of the values in methods$target
# `data` is each of the values in datasets$target
# note: tidyeval syntax is to get all the values from the previous plans,
# but keep them as unevaluated symbols, so that drake_plan handles
# them appropriately
analysis = target(fun(data),
transform = cross(fun = !!rlang::syms(methods$target),
data = !!rlang::syms(datasets$target))
),
# create a list of the created `analysis` objects, grouping by the `fun`
# that made them - this keeps the results from the different methods
# separated, so that the reports/syntheses can handle the right outputs
results = target(collect(analysis, ignore(analyses)),
transform = combine(analysis, .by = fun)),
trace = TRUE
)
print(analyses)
#> # A tibble: 9 x 6
#> target command fun data analysis results
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 analysis_ld… lda(portal_data) lda porta… analysis_ld… <NA>
#> 2 analysis_ld… lda(maizuru_data) lda maizu… analysis_ld… <NA>
#> 3 analysis_ld… lda(jornada_data) lda jorna… analysis_ld… <NA>
#> 4 analysis_ld… lda(sgs_data) lda sgs_d… analysis_ld… <NA>
#> 5 analysis_ld… lda(bbs_data) lda bbs_d… analysis_ld… <NA>
#> 6 analysis_ld… lda(sdl_data) lda sdl_d… analysis_ld… <NA>
#> 7 analysis_ld… lda(mtquad_data) lda mtqua… analysis_ld… <NA>
#> 8 analysis_ld… lda(bad_portal) lda bad_p… analysis_ld… <NA>
#> 9 results_lda "collect(list(analysis_l… lda <NA> <NA> results…
Created on 2019-02-12 by the reprex package (v0.2.0).
Ooh, thanks!
So I see two things: 1) analyses is not correct in mine and 2) I get a message:
Warning message:
Converting double-quotes to single-quotes because the `strings_in_dots` argument is missing. Use the file_in(), file_out(), and knitr_in() functions to work with files in your commands. To remove this warning, either call `drake_plan()` with `strings_in_dots = "literals"` or use `pkgconfig::set_config("drake::strings_in_dots" = "literals")`.
I'm going to try the pkgconfig
and see if that fixes it.
> library(MATSS)
Please look at our data formats by running `vignette("data-formats")`
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
> library(drake)
>
> ## Read in the maizuru community data from a csv file
> get_maizuru_data <- function()
+ {
+ data_path <- system.file("extdata", "Maizuru_dominant_sp.csv",
+ package = "MATSS", mustWork = TRUE)
+ raw_data <- read.csv(data_path)
+
+ list(abundance = dplyr::select(raw_data, -date_tag, -surf.t, -bot.t, -Y, -M, -D) %>%
+ mutate_all(~round(. + 1e-10)),
+ covariates = dplyr::select(raw_data, date_tag, surf.t, bot.t, Y, M, D))
+ }
>
> ## Get raw data
> datasets_raw <- drake_plan(
+ bbs_data_tables = rdataretriever::fetch("breed-bird-survey"),
+ sdl_data_tables = rdataretriever::fetch("veg-plots-sdl"),
+ mtquad_data_tables = rdataretriever::fetch("mapped-plant-quads-mt")
+ )
Warning message:
Converting double-quotes to single-quotes because the `strings_in_dots` argument is missing. Use the file_in(), file_out(), and knitr_in() functions to work with files in your commands. To remove this warning, either call `drake_plan()` with `strings_in_dots = "literals"` or use `pkgconfig::set_config("drake::strings_in_dots" = "literals")`.
>
> ## Clean and transform the data into the appropriate format
> datasets <- drake_plan(
+ portal_data = get_portal_rodents(),
+ maizuru_data = get_maizuru_data(),
+ jornada_data = process_jornada_data(),
+ sgs_data = process_sgs_data(),
+ bbs_data = get_bbs_data(bbs_data_tables, region = 7),
+ sdl_data = get_sdl_data(sdl_data_tables),
+ mtquad_data = get_mtquad_data(mtquad_data_tables),
+ bad_portal = portal_data[[1]]
+
+ )
>
> ## Analysis methods
> methods <- drake_plan(
+ lda = function(dataset) {run_LDA(dataset, max_topics = 6, nseeds = 20)}
+ )
>
> ## Define how results are collected
> collect <- function(list_of_results, plan)
+ {
+ names(list_of_results) <- all.vars(match.call()$list_of_results)
+ list_of_results
+ }
>
> ## The combination of each method x dataset
> analyses <- drake_plan(
+ # expand out each `fun(data)``, where
+ # `fun` is each of the values in methods$target
+ # `data` is each of the values in datasets$target
+ # note: tidyeval syntax is to get all the values from the previous plans,
+ # but keep them as unevaluated symbols, so that drake_plan handles
+ # them appropriately
+ analysis = target(fun(data),
+ transform = cross(fun = !!rlang::syms(methods$target),
+ data = !!rlang::syms(datasets$target))
+ ),
+ # create a list of the created `analysis` objects, grouping by the `fun`
+ # that made them - this keeps the results from the different methods
+ # separated, so that the reports/syntheses can handle the right outputs
+ results = target(collect(analysis, ignore(analyses)),
+ transform = combine(analysis, .by = fun)),
+ trace = TRUE
+ )
> print(analyses)
# A tibble: 3 x 3
target command transform
<chr> <chr> <chr>
1 analysis fun(data) "cross(fun = list(lda), data = list(por…
2 results collect(analysis, ig… combine(analysis, .by = fun)
3 trace TRUE NA
>
Alas, no. It got rid of the warning message, but the rest of the output stayed the same.
Stepping through the code to make the analysis
section, I get a couple of kinds of error:
analysis = target(fun(data),
transform = cross(fun = !!rlang::syms(methods$target),
data = !!rlang::syms(datasets$target))
)
rlang
: For example:
> fun_test = !!rlang::syms(methods$target)
Error in !rlang::syms(methods$target) : invalid argument type
data_test = !!rlang::syms(datasets$target)
> data_test = !!rlang::syms(datasets$target)
Error in !rlang::syms(datasets$target) : invalid argument type
Googling lead me to updating everything (R, all tidyverse packages, rlang), which didn't work. If I understand correctly, this line is trying to create lists fun
and data
(or in this case, fun_test
) of all the things in methods$target
and datasets$target
but as symbols rather than strings-in-quotes. I can accomplish this if I remove the !!
:
> fun_test = rlang::syms(methods$target)
> data_test = rlang::syms(datasets$target)
> fun_test
[[1]]
lda
data_test [[1]] portal_data
[[2]] maizuru_data
(cut off for length)
I don't know if this is an OK solution or if the issue is specific to my setup? I'd explore further, but....
1. `cross` This happens:
transform = cross(rlang::syms(methods$target),
- rlang::syms(datasets$target)) Error in cross(rlang::syms(methods$target), rlang::syms(datasets$target)) : could not find function "cross"
or
transform = cross(fun_test,
- data_test) Error in cross(fun_test, data_test) : could not find function "cross"
Which package is `cross` coming from in this case?
I tried some detective work (see below) but am coming up confused. I think (rather than necessarily wading through what I've tried) it would help me if I could see what analysis
, the transform
pieces, and results
look like when the pipeline is working properly? I.e. I'm not sure when the expansion is supposed to happen vs. passing around the command to do the expansion, if that makes sense...
cross
efforts (these don't lead to any resolution, so probably a good time for tl;dr):
Googling got me to purrr::cross
, which doesn't work with this syntax:
> transform = purrr::cross(fun_test,
+ data_test)
Error in .l[[j]][[index]] : object of type 'symbol' is not subsettable
but purrr::cross2
does:
> transform = purrr::cross2(fun_test, data_test)
> str(transform)
List of 8
$ :List of 2
..$ : symbol lda
..$ : symbol portal_data
$ :List of 2
..$ : symbol lda
..$ : symbol maizuru_data
(again cut off for length)
As I understand, this still isn't quite what we want. We want transform
to be like
> transform_goal = list(list(fun = rlang::sym('lda'), data = rlang::sym('portal_data')))
> str(transform_goal)
List of 1
$ :List of 2
..$ fun : symbol lda
..$ data: symbol portal_data
But even this doesn't work, because transform
is getting quoted as a string instead of evaluated and expanded to populate analysis
or analyses
.
Ok, sorry, this might be my bad. I think you might need version 6.2.1.9002+ of drake
, as that implements the new syntax for specifying complex plans. Can you reinstall drake from github and try again?
That's it! Thanks!
Also, welcome to the land of rlang and NSE 🙀
I'm struggling to get the pipeline to run on my computer, using the most up-to-date version of
master
. (I'm working in rmd-debug-pipeline, but it's a copy of master at the moment). Here is the highest-level error:and what the
analysis
part of the pipeline looks like:I've done some digging, but before going into an extended narrative of that, I think it would help a lot if somebody whose pipeline is working (@ha0ye?) could confirm that this is the correct type of content for
analysis
? Thanks!