Closed ha0ye closed 4 years ago
You can access the value of fun
(or any other trace columns) from within your target command if you include them in your transform; in this case combine(analysis, fun, .by = fun)
:
library(drake)
plan <- drake_plan(
analysis = target(
fun(data),
transform = cross(fun = c(str, names),
data = c(mtcars, iris),
.id = data)
),
results = target(list(analysis, fun = fun),
transform = combine(analysis, fun, .by = fun))
)
make(plan)
#> target analysis_iris
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> target analysis_mtcars
#> 'data.frame': 32 obs. of 11 variables:
#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
#> $ disp: num 160 160 108 258 360 ...
#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
#> $ qsec: num 16.5 17 18.6 19.4 17 ...
#> $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
#> $ am : num 1 1 1 0 0 0 0 0 0 0 ...
#> $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
#> $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
#> target analysis_iris_2
#> target analysis_mtcars_2
#> target results_str
#> target results_names
readd(results_names)
#> [[1]]
#> [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
#> [11] "carb"
#>
#> [[2]]
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#>
#> $fun
#> function (x) .Primitive("names")
readd(results_str)
#> [[1]]
#> NULL
#>
#> [[2]]
#> NULL
#>
#> $fun
#> function (object, ...)
#> UseMethod("str")
#> <bytecode: 0x55716920ca78>
#> <environment: namespace:utils>
Created on 2020-03-19 by the reprex package (v0.3.0)
Since your .by
variable is a function, this isn't very legible, but you can quote
it if you need to.
library(drake)
plan <- drake_plan(
analysis = target(
fun(data),
transform = cross(fun = c(str, names),
data = c(mtcars, iris),
.id = data)
),
results = target(list(analysis, fun = quote(fun)),
transform = combine(analysis, fun, .by = fun))
)
make(plan)
#> In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
#> target analysis_iris
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> target analysis_mtcars
#> 'data.frame': 32 obs. of 11 variables:
#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
#> $ disp: num 160 160 108 258 360 ...
#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
#> $ qsec: num 16.5 17 18.6 19.4 17 ...
#> $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
#> $ am : num 1 1 1 0 0 0 0 0 0 0 ...
#> $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
#> $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
#> target analysis_iris_2
#> target analysis_mtcars_2
#> target results_str
#> target results_names
readd(results_names)
#> [[1]]
#> [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
#> [11] "carb"
#>
#> [[2]]
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#>
#> $fun
#> names
readd(results_str)
#> [[1]]
#> NULL
#>
#> [[2]]
#> NULL
#>
#> $fun
#> str
Created on 2020-03-19 by the reprex package (v0.3.0)
I agree with @brendanf that this sort of tracking is best handled with prospective labeling in custom commands and functions. deparse()
and quote()
work really well together here.
library(drake)
library(tibble)
plan <- drake_plan(
analysis = target(
tibble(
value = fun(data),
fun = deparse(quote(fun)),
data = deparse(quote(data))
),
transform = cross(
fun = c(nrow, ncol),
data = c(mtcars, iris)
)
)
)
drake_plan_source(plan)
#> drake_plan(
#> analysis_nrow_mtcars = tibble(value = nrow(mtcars), fun = deparse(quote(nrow)), data = deparse(quote(mtcars))),
#> analysis_ncol_mtcars = tibble(value = ncol(mtcars), fun = deparse(quote(ncol)), data = deparse(quote(mtcars))),
#> analysis_nrow_iris = tibble(value = nrow(iris), fun = deparse(quote(nrow)), data = deparse(quote(iris))),
#> analysis_ncol_iris = tibble(value = ncol(iris), fun = deparse(quote(ncol)), data = deparse(quote(iris)))
#> )
make(plan)
#> ▶ target analysis_ncol_mtcars
#> ▶ target analysis_nrow_iris
#> ▶ target analysis_nrow_mtcars
#> ▶ target analysis_ncol_iris
readd(analysis_ncol_mtcars)
#> # A tibble: 1 x 3
#> value fun data
#> <int> <chr> <chr>
#> 1 11 ncol mtcars
Created on 2020-03-20 by the reprex package (v0.3.0)
Prework
drake
's code of conduct.Proposal
In my use case, I am constructing large plans using
cross()
to get all {analysis} x {data} combinations. Because some datasets and analysis targets already have underscores, the default target name is difficult to parse (i.e. using regex) to extract back the name of the underlying {analysis} or {data} name.Possible Solution 1
Expand the syntax for the
.id
argument to e.g. use theglue
package, or something similar:Current work-around
My current work-around is to specify
.id = data
, and then pass in.id_chr
into my function that collects the results:where
collect()
extracts the {data} name from its first argument, and then the {analysis} name from the second argument (making use of.id_chr
).Possible Solution 2
This suggests an alternative solution, which is to expand the plan information that is accessible by commands when building targets. For example, if it were possible to access the (hidden) columns
fun
anddata
that show up when the plan is constructed withtrace = TRUE
, that would also facilitate making "metadata" from the plan visible to commands.