Closed jennysjaarda closed 3 years ago
Just a quick comment. I think I used to have an issue with duplicate targets as I have a comment in my script:
# this should be identical to targetXX above, but needs to be a different name
Perhaps an update allowed for duplicate targets in one project?
I would recommend this:
whole_plan <- drake_plan(
continents = gapminder_continents(),
model = target(fit_model(continents), dynamic = map(continents))
)
make(whole_plan, targets = "continents")
make(whole_plan)
This avoids potential unintended consequences of omitting targets from the plan itself.
By the way, targets::tar_make()
and targets::tar_make_clustermq()
support an analogous names
argument. Unlike drake
, targets
supports tidyselect
syntax, e.g. tar_make(names = starts_with("abc"))
.
Thanks for the suggestion. It's not entirely clear to me. So the targets
argument specifies explicitly the targets that you want to run? What if the plan2
as defined as above has many more targets, such that it wouldn't be very practical to specify explicitly the targets? For example:
plan1 <- drake_plan(
data_geno = load_geno(),
data_pheno = load_pheno(),
munge_geno = geno_munge(data_geno),
munge_pheno = pheno_munge(data_pheno),
analysis1 = simple_analysis(munge_pheno, munge_geno)
)
make(plan1,
parallelism = "clustermq", template = list(cpus = 1))
plan2 <- drake_plan(
data_geno = load_geno(),
data_pheno = load_pheno(),
munge_geno = geno_munge(data_geno),
munge_pheno = pheno_munge(data_pheno),
analysis2 = computationally_intense_analysis(munge_pheno, munge_geno)
)
make(plan2,
parallelism = "clustermq", template = list(cpus = 8))
Would the above work? Sorry if I'm missing something obvious.
By the way, I just noticed the new {targets} package! (I am just returning from being away on parental leave for several months which is why I missed it.) Looking forward to trying it out. My current project is quite embedded with the {drake} infrastructure so I guess it's best to keep it with drake. If I migrated I suppose all targets would be outdated and need to be re-run?
So the targets argument specifies explicitly the targets that you want to run?
Yes, plus anything upstream that those targets depend on. That is the issue. Upstream targets need to be processed before downstream ones can start. If some targets in plan2
have dependencies in plan1
, then make(plan2)
will not work as expected. That is why I do not encourage this pattern. The following is safer and achieves the same goals.
plan1 <- drake_plan(...)
plan2 <- drake_plan(...)
whole_plan <- bind_plans(plan1, plan2)
make(whole_plan, targets = plan1$target, parallelism = "clustermq", template = list(cpus = 1))
make(whole_plan, targets = plan2$target, parallelism = "clustermq", template = list(cpus = 8))
My current project is quite embedded with the {drake} infrastructure so I guess it's best to keep it with drake.
Yes, old projects can keep using drake
, and new projects should use targets
. As you probably saw, drake
is not deprecated, just superseded. I will keep it running indefinitely so old projects do not break, but I do not plan to add significant enhancements.
Prework
Description
Hello, I've seen you mention before that target names within a project should be unique (e.g. here). However, I have multiple plans within one project because my plans have different computing requirements (I am using the
clustermq
backend rather thanfuture
asclustermq
seems to be more reliable and much faster). If I replicate two targets exactly, I don't seem to have any problems. See an example from the book below.Reproducible example
Desired result
The output is as expected. Is the above bad practice? Do you see any issue with proceeding as I have above?
Thanks a lot in advance!