Closed edgBR closed 3 years ago
tibbles
are great data structures for pipelines. For a single dynamic target, all the sub-targets are automatically combined.
library(drake)
library(tibble)
plan <- drake_plan(
index = seq_len(3),
data1 = target(tibble(x = index), dynamic = map(index)),
data2 = data1
)
make(plan)
#> ▶ target index
#> ▶ dynamic data1
#> > subtarget data1_0b3474bd
#> > subtarget data1_b2a5c9b8
#> > subtarget data1_71f311ad
#> ■ finalize data1
#> ▶ target data2
# Sub-targets automatically combined in exploratory data analysis.
readd(data1)
#> # A tibble: 3 x 1
#> x
#> <int>
#> 1 1
#> 2 2
#> 3 3
# Sub-targets automatically combined in downstream targets.
readd(data2)
#> # A tibble: 3 x 1
#> x
#> <int>
#> 1 1
#> 2 2
#> 3 3
Created on 2020-10-01 by the reprex package (v0.3.0)
For dynamic branching within static branching, you can use combine()
to bring everything together.
library(drake)
library(tibble)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
plan <- drake_plan(
index_dynamic = seq_len(2),
data_mapped = target(
tibble(static = index_static, dynamic = index_dynamic),
transform = map(index_static = c(1, 2)),
dynamic = map(index_dynamic)
),
data_combined = target(
bind_rows(data_mapped),
transform = combine(data_mapped)
)
)
plot(plan)
make(plan)
#> ▶ target index_dynamic
#> ▶ dynamic data_mapped_1
#> > subtarget data_mapped_1_0b3474bd
#> > subtarget data_mapped_1_b2a5c9b8
#> ■ finalize data_mapped_1
#> ▶ dynamic data_mapped_2
#> > subtarget data_mapped_2_0b3474bd
#> > subtarget data_mapped_2_b2a5c9b8
#> ■ finalize data_mapped_2
#> ▶ target data_combined
readd(data_combined)
#> # A tibble: 4 x 2
#> static dynamic
#> <dbl> <int>
#> 1 1 1
#> 2 1 2
#> 3 2 1
#> 4 2 2
Created on 2020-10-01 by the reprex package (v0.3.0)
Comments:
targets
, the long-term successor of drake
. drake
is never going away, but it is also never going to have built-in cloud support. But targets
can seamlessly upload and track target return values on Amazon S3: https://wlandau.github.io/targets-manual/cloud.html
Prework
Hi again drake colleagues. It seems that now my implementation is working and now I can train, evaluated properly in train and test data and save my models and accuracy metrics in s3! Prework here:
Original workflow: #1293 Modified workflow to correct the DAG: #1294 First attemp to rewrite the plan using dynamic targets: #1311 Linking dynamic targets: #1314 Linking multiple dynamic targets to a target: #1321
Description
The small beast now looks like this:
And my plan looks as follows:
Desired result
Now my objective was to evaluate the best model for an specific counter combination as I call and then refit my model to the whole historical data accordingly. The function that I wrote to do this is as follow:
The problem is that when I add this to my plan I was doing as follows:
Clearly not the right approach as the target does not happen after saveModels:
My idea now is to return a dataframe in accuracy. I have modified the function as follows:
So the idea is to now join all of the subtargets tibbles in one but honestly I am a bit lost of how to do this. I looked into #685 but I dont know if this is applicable for my case as I have dynamic and static branching mixed together.
Any suggestions?
BR /Edgar