Closed wlandau closed 4 years ago
Registering dynamic sub-targets requires us to modify config
objects, specifically the layout, graph, and priority queue. Because of the way the internals are currently structured, it would be best to modify these objects by reference. We already do this with the priority queue, and it is straightforward enough to use an environment instead of a list for the layout. But we may have to wrap the graph in an environment of its own. Added some action items.
Unfortunately, dynamic branching is currently slower than static branching when it comes to actually building targets.
library(drake)
plan_dynamic <- drake_plan(
x = seq_len(1e4),
y = target(x, dynamic = map(x))
)
plan_static <- drake_plan(
z = target(w, transform = map(w = !!seq_len(1e4)))
)
cache_dynamic <- storr::storr_rds(tempfile())
cache_static <- storr::storr_rds(tempfile())
system.time(
config_dynamic <- drake_config(
plan_dynamic,
cache = cache_dynamic,
verbose = 0L
)
)
#> user system elapsed
#> 0.026 0.003 0.030
system.time(
config_static <- drake_config(
plan_static,
cache = cache_static,
verbose = 0L
)
)
#> user system elapsed
#> 1.904 0.004 1.910
system.time(
suppressWarnings( # different issue
make(config = config_dynamic)
)
)
#> user system elapsed
#> 78.014 3.630 81.767
system.time(
suppressWarnings(
make(config = config_static)
)
)
#> user system elapsed
#> 32.712 3.195 36.049
Created on 2019-11-02 by the reprex package (v0.3.0)
The good news is that make()
is much faster to initialize. Because we have smaller plans, drake_config()
runs super quickly. And for subsequent make()
s, it is faster to check if everything is up to date.
library(drake)
library(profile)
library(jointprof)
plan_dynamic <- drake_plan(
x = seq_len(1e4),
y = target(x, dynamic = map(x))
)
plan_static <- drake_plan(
z = target(w, transform = map(w = !!seq_len(1e4)))
)
cache_dynamic <- storr::storr_rds(tempfile())
cache_static <- storr::storr_rds(tempfile())
system.time(
config_dynamic <- drake_config(
plan_dynamic,
cache = cache_dynamic,
verbose = 0L
)
)
#> user system elapsed
#> 0.027 0.003 0.032
system.time(
config_static <- drake_config(
plan_static,
cache = cache_static,
verbose = 0L
)
)
#> user system elapsed
#> 3.525 0.004 3.530
Rprof(filename = "dynamic.rprof")
suppressWarnings(
system.time(make(config = config_dynamic), gcFirst = FALSE)
)
#> user system elapsed
#> 99.096 3.656 102.928
Rprof(NULL)
data <- read_rprof("dynamic.rprof")
write_pprof(data, "dynamic.pprof")
Rprof(filename = "static.rprof")
suppressWarnings(
system.time(make(config = config_static), gcFirst = FALSE)
)
#> user system elapsed
#> 52.112 3.708 55.916
Rprof(NULL)
data <- read_rprof("static.rprof")
write_pprof(data, "static.pprof")
suppressWarnings(
system.time(make(config = config_dynamic), gcFirst = FALSE)
)
#> user system elapsed
#> 3.239 0.164 3.418
suppressWarnings(
system.time(make(config = config_static), gcFirst = FALSE)
)
#> user system elapsed
#> 13.847 0.472 14.347
file.copy("dynamic.pprof", "~/Downloads")
#> [1] TRUE
file.copy("static.pprof", "~/Downloads")
#> [1] TRUE
Created on 2019-11-02 by the reprex package (v0.3.0)
I used those pprof
files at the bottom to generate the flame graphs below. The one on the left is from static branching, and the one on the right is from dynamic branching.
It looks like the main hangup is loading sub-target dependencies and registering sub-targets. Not too surprising. Speeding this up is going to be another slow-going long-term project. If you have more examples that demonstrate slowness, please post them. It took a long time to get static branching as fast as it is now, and I expect the same for dynamic branching.
The implementation in #1042 is different from https://github.com/ropensci/drake/issues/685#issuecomment-541460119. In particular, the flowchart in https://user-images.githubusercontent.com/1580860/66722470-27ede180-eddc-11e9-97ea-930c5a93d287.png.
The procedure for sub-targets is actually simpler than I had originally planned.
make()
.Each dynamic target has its own value alongside the values of the sub-targets. We recompute this value if
Why (2)? Because in some situations, we already have all the sub-targets, but we use fewer of them.
library(drake)
plan <- drake_plan(
x = seq_len(3),
y = target(x, dynamic = map(x))
)
make(plan)
#> target x
#> subtarget y_0b3474bd
#> subtarget y_b2a5c9b8
#> subtarget y_71f311ad
# readd() and loadd() understand dynamic targets.
readd(y)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
# But a dynamic target is really just a vector of hashes.
cache <- drake_cache()
cache$get("y")
#> [1] "3908fe5069df3c28" "16b3cb68bd4872ed" "1a3b3c0d06147d80"
#> attr(,"class")
#> [1] "drake_dynamic"
# What if we shorten y?
plan <- drake_plan(
x = seq_len(2),
y = target(x, dynamic = map(x))
)
# y needs to change, but we leave the sub-targets alone.
make(plan)
#> target x
# readd() and loadd() understand dynamic targets.
readd(y)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
# But a dynamic target is really just a vector of hashes.
cache$get("y")
#> [1] "3908fe5069df3c28" "16b3cb68bd4872ed"
#> attr(,"class")
#> [1] "drake_dynamic"
Created on 2019-11-02 by the reprex package (v0.3.0)
The sub-target names are ugly (e.g. y_71f311ad1
) but incredibly useful.
y_1
, y_2
, etc. (In fact, that is what I originally proposed in https://github.com/ropensci/drake/issues/685#issuecomment-541460119.) But if we did that, we would invalidate y_2
every time we insert an element in the middle of x
. With hashes, we do not have this problem: the sub-targets of y
can be in any order and still remain valid.library(drake)
plan <- drake_plan(
x = c("a", "b"),
y = target(x, dynamic = map(x))
)
make(plan)
#> In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
#> target x
#> subtarget y_89ca58a1
#> subtarget y_38e75e51
plan <- drake_plan(
x = c("a", "inserted_element", "b"),
y = target(x, dynamic = map(x))
)
# Only one sub-target needs to build.
make(plan)
#> target x
#> subtarget y_06d53fef
# Permute x.
plan <- drake_plan(
x = c("inserted_element", "b", "a"),
y = target(x, dynamic = map(x))
)
# All sub-targets are still up to date!
make(plan)
#> target x
Created on 2019-11-02 by the reprex package (v0.3.0)
Implemented in #1042.
Also noteworthy: mapping over rows: https://github.com/ropensci/drake/pull/1042#issuecomment-549096614
New chapter in the manual: https://ropenscilabs.github.io/drake-manual/dynamic.html
One source of overhead I overlooked: computing the hashes of sub-values that go into the names of sub-targets. Unavoidable, but not terrible.
Dynamic parent targets are already vectors of hashes, so we can avoid this overhead if the dynamic dependency is itself dynamic: 5a07f675b1d0b648d6d61b6fa4cba2465c7bc941. Otherwise, we need to compute the hashes of all the sub-values.
Update: dynamic branching just got a huge speed boost in #1089 thanks to help from @billdenney and @eddelbuettel. With improvements both in development drake
and development digest
, dynamic branching is now about 33% faster than static branching overall. Benchmarking workflow: https://github.com/wlandau/drake-examples/blob/master/overhead/dynamic.R vs https://github.com/wlandau/drake-examples/blob/master/overhead/static.R.
We want to declare targets and modify the dependency graph while
make()
is running. Sometimes, we do not know what the targets should be until we see the values of previous targets. The following plan sketches the idea.Issues:
outdated()
work now? Do we have to read the targets back into memory to check if the downstream stuff is up to date?drake
has faced. Hopefully the work will migrate to theworkers
package.