Closed matthiasgomolka closed 3 years ago
The purpose of those checksums is to account for files that live on different drives on high-performance computing systems. Does that file you mentioned live on an external drive?
Thanks for the quick response.
The file lives on a network drive under SVN version control. (Don't ask why both network drive and version control, but it makes sense in this specific setting)
I thought so. I suspect the worker is has trouble accessing that file. You can test with something like future::future(custom_check_file())
with the same future::plan()
you use with drake
.
Not sure if I understand correctly. Is this what you mean?
> future::plan("multisession", workers = 4)
> future::future(file.exists("my/file"))
MultisessionFuture:
Label: ‘<none>’
Expression:
file.exists("my/file")
Lazy evaluation: FALSE
Asynchronous evaluation: TRUE
Local evaluation: TRUE
Environment: <environment: R_GlobalEnv>
Capture standard output: TRUE
Capture condition classes: ‘condition’
Globals: <none>
Packages: <none>
L'Ecuyer-CMRG RNG seed: <none> (seed = FALSE)
Resolved: TRUE
Value: <not collected>
Conditions captured: <none>
Early signaling: FALSE
Owner process: b58884ed-e90b-e732-4c46-eedcac8309a1
Class: ‘MultisessionFuture’, ‘ClusterFuture’, ‘MultiprocessFuture’, ‘Future’, ‘environment’
Is that of any help?
I was thinking you could compare
digest::digest("my/file", algo = "sha256", file = TRUE)
to
future::plan("your_future_plan", ...)
out <- future::future(digest::digest("my/file", algo = "sha256", file = TRUE))
value(out)
But since you are using a multisession plan, and not a future.batchtools
plan or an alternative cluster, those workers should be on the same node as the main process.
What if you run the original pipeline with "future" parallelism and future::plan("sequential")
?
What a coincidence: My R version got updated right now (to 4.0.2) and now it works with the initial settings. Sorry for bothering you!
I forgot to remove hpc = FALSE
for the target, so my statement above is not correct. Please ignore. I will comment more tomorrow.
I was thinking you could compare
digest::digest("my/file", algo = "sha256", file = TRUE)
to
future::plan("your_future_plan", ...) out <- future::future(digest::digest("my/file", algo = "sha256", file = TRUE)) value(out)
But since you are using a multisession plan, and not a
future.batchtools
plan or an alternative cluster, those workers should be on the same node as the main process.What if you run the original pipeline with "future" parallelism and
future::plan("sequential")
?
Both attempts yield the same hash value.
With future::plan("sequential")
it works fine.
(This time I double checked that hpc = TRUE
was not set.)
I find it surprising that you would see this particular warning from drake with a multisession plan and not a sequential plan. Both worker processes exist on your local machine, and drake is doing the same thing in both cases. I’m not sure what else I can do at the moment since I cannot reproduce this.
Yes, it's strange. I wasn't able to create a minimal reprex either. I guess we can close this since there exists an easy workaround with hpc = FALSE
for this target.
Good (or bad?) news: I was finally able to reproduce the error:
library(drake)
#> Warning: package 'drake' was built under R version 3.6.3
plan <- drake_plan(
some_target = target(
getOption("some_option"),
format = "file"
)
)
future::plan("multisession", workers = 2L)
# the option set in 'prework' is not set for the workers, if format = "file"
make(
plan,
parallelism = "future",
prework = options("some_option" = "OPTION VALUE")
)
#> > target some_target
#> Warning: package 'drake' was built under R version 3.6.3
#> Warning: No checksum available for target some_target.
#> x fail some_target
#> Error: target some_target failed.
#> diagnose(some_target)$error$message:
#> future worker terminated before target could complete.
#> diagnose(some_target)$error$calls:
#>
plan_no_file <- drake_plan(
some_target = target(
getOption("some_option")
# without format = "file"
)
)
# now it works
make(
plan_no_file,
parallelism = "future",
prework = options("some_option" = "OPTION VALUE")
)
#> > target some_target
Created on 2020-10-27 by the reprex package (v0.3.0)
It seems as if the option set in prework
is not set for the future
workers, when the target's format
is "file"
.
Please note that I got the problem also with another target, where no format
was set.
Thanks, that helps. The problem is that prework
needs to be a language object, expression, or character string. The R code you supply gets evaluated right away. If you supply quote(options("some_option" = "OPTION VALUE"))
, the example works.
library(drake)
file.create("tmp")
#> [1] TRUE
plan <- drake_plan(
some_target = target(getOption("some_option"), format = "file")
)
future::plan("multisession", workers = 2L)
make(
plan,
parallelism = "future",
prework = quote(options("some_option" = "tmp")) # Needs to be a language object.
)
#> ▶ target some_target
Created on 2020-10-27 by the reprex package (v0.3.0)
Sorry for bothering you so frequently in the recent time! But drake is so good that I use it on a daily basis and therefore stumble upon some problems.
Prework
Description
This only occurs if I use
r_make()
withparallelism = "future"
.parallelism = "loop"
works fine.I have a simple target with
format = "file"
. When Ir_make()
the plan containing the target withparallelism = "future"
, it fails. The log file shows to following:When I keep everything identical but set
parallelism = "loop"
, it works fine.I already tried
clean(md, garbage_colletion = TRUE)
, but this did not solve the problem.EDIT: As the problem is related to parallelism, I could resolve it specifying
hpc = FALSE
for targetmd
. Still, I would like to understand to problem.Reproducible example
I could not reproduce the problem in a minimal reprex. Could it be related to some problem in the cache?See comment below: https://github.com/ropensci/drake/issues/1338#issuecomment-717180597
Desired result
It should not make a difference wether I
r_make()
a plan sequential or in parallel usingfuture
.Session info
End the reproducible example with a call to
sessionInfo()
in the same session (e.g.reprex(si = TRUE)
) and include the output.