ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

Warning: No checksum available for target ... #1338

Closed matthiasgomolka closed 3 years ago

matthiasgomolka commented 3 years ago

Sorry for bothering you so frequently in the recent time! But drake is so good that I use it on a daily basis and therefore stumble upon some problems.

Prework

Description

This only occurs if I use r_make() with parallelism = "future". parallelism = "loop" works fine.

I have a simple target with format = "file". When I r_make() the plan containing the target with parallelism = "future", it fails. The log file shows to following:

... | md | fail
... |    | Warning: No checksum available for target md.

When I keep everything identical but set parallelism = "loop", it works fine.

I already tried clean(md, garbage_colletion = TRUE), but this did not solve the problem.

EDIT: As the problem is related to parallelism, I could resolve it specifying hpc = FALSE for target md. Still, I would like to understand to problem.

Reproducible example

I could not reproduce the problem in a minimal reprex. Could it be related to some problem in the cache?

See comment below: https://github.com/ropensci/drake/issues/1338#issuecomment-717180597

Desired result

It should not make a difference wether I r_make() a plan sequential or in parallel using future.

Session info

End the reproducible example with a call to sessionInfo() in the same session (e.g. reprex(si = TRUE)) and include the output.

wlandau commented 3 years ago

The purpose of those checksums is to account for files that live on different drives on high-performance computing systems. Does that file you mentioned live on an external drive?

matthiasgomolka commented 3 years ago

Thanks for the quick response.

The file lives on a network drive under SVN version control. (Don't ask why both network drive and version control, but it makes sense in this specific setting)

wlandau commented 3 years ago

I thought so. I suspect the worker is has trouble accessing that file. You can test with something like future::future(custom_check_file()) with the same future::plan() you use with drake.

matthiasgomolka commented 3 years ago

Not sure if I understand correctly. Is this what you mean?

> future::plan("multisession", workers = 4)
> future::future(file.exists("my/file"))
MultisessionFuture:
Label: ‘<none>’
Expression:
file.exists("my/file")
Lazy evaluation: FALSE
Asynchronous evaluation: TRUE
Local evaluation: TRUE
Environment: <environment: R_GlobalEnv>
Capture standard output: TRUE
Capture condition classes: ‘condition’
Globals: <none>
Packages: <none>
L'Ecuyer-CMRG RNG seed: <none> (seed = FALSE)
Resolved: TRUE
Value: <not collected>
Conditions captured: <none>
Early signaling: FALSE
Owner process: b58884ed-e90b-e732-4c46-eedcac8309a1
Class: ‘MultisessionFuture’, ‘ClusterFuture’, ‘MultiprocessFuture’, ‘Future’, ‘environment’

Is that of any help?

wlandau commented 3 years ago

I was thinking you could compare

digest::digest("my/file", algo = "sha256", file = TRUE)

to

future::plan("your_future_plan", ...)
out <- future::future(digest::digest("my/file", algo = "sha256", file = TRUE))
value(out)

But since you are using a multisession plan, and not a future.batchtools plan or an alternative cluster, those workers should be on the same node as the main process.

What if you run the original pipeline with "future" parallelism and future::plan("sequential")?

matthiasgomolka commented 3 years ago

What a coincidence: My R version got updated right now (to 4.0.2) and now it works with the initial settings. Sorry for bothering you!

matthiasgomolka commented 3 years ago

I forgot to remove hpc = FALSE for the target, so my statement above is not correct. Please ignore. I will comment more tomorrow.

matthiasgomolka commented 3 years ago

I was thinking you could compare

digest::digest("my/file", algo = "sha256", file = TRUE)

to

future::plan("your_future_plan", ...)
out <- future::future(digest::digest("my/file", algo = "sha256", file = TRUE))
value(out)

But since you are using a multisession plan, and not a future.batchtools plan or an alternative cluster, those workers should be on the same node as the main process.

What if you run the original pipeline with "future" parallelism and future::plan("sequential")?

Both attempts yield the same hash value.

With future::plan("sequential") it works fine.

(This time I double checked that hpc = TRUE was not set.)

wlandau commented 3 years ago

I find it surprising that you would see this particular warning from drake with a multisession plan and not a sequential plan. Both worker processes exist on your local machine, and drake is doing the same thing in both cases. I’m not sure what else I can do at the moment since I cannot reproduce this.

matthiasgomolka commented 3 years ago

Yes, it's strange. I wasn't able to create a minimal reprex either. I guess we can close this since there exists an easy workaround with hpc = FALSE for this target.

matthiasgomolka commented 3 years ago

Good (or bad?) news: I was finally able to reproduce the error:

library(drake)
#> Warning: package 'drake' was built under R version 3.6.3

plan <- drake_plan(
  some_target = target(
    getOption("some_option"),
    format = "file"
  )
)

future::plan("multisession", workers = 2L)

# the option set in 'prework' is not set for the workers, if format = "file"
make(
  plan,
  parallelism = "future",
  prework = options("some_option" = "OPTION VALUE")
)
#> > target some_target
#> Warning: package 'drake' was built under R version 3.6.3
#> Warning: No checksum available for target some_target.
#> x fail some_target
#> Error: target some_target failed.
#> diagnose(some_target)$error$message:
#>   future worker terminated before target could complete.
#> diagnose(some_target)$error$calls:
#> 

plan_no_file <- drake_plan(
  some_target = target(
    getOption("some_option")
    # without format = "file"
  )
)

# now it works
make(
  plan_no_file,
  parallelism = "future",
  prework = options("some_option" = "OPTION VALUE")
)
#> > target some_target

Created on 2020-10-27 by the reprex package (v0.3.0)

It seems as if the option set in prework is not set for the future workers, when the target's format is "file".

Please note that I got the problem also with another target, where no format was set.

wlandau commented 3 years ago

Thanks, that helps. The problem is that prework needs to be a language object, expression, or character string. The R code you supply gets evaluated right away. If you supply quote(options("some_option" = "OPTION VALUE")), the example works.

library(drake)
file.create("tmp")
#> [1] TRUE
plan <- drake_plan(
  some_target = target(getOption("some_option"), format = "file")
)
future::plan("multisession", workers = 2L)

make(
  plan,
  parallelism = "future",
  prework = quote(options("some_option" = "tmp")) # Needs to be a language object.
)
#> ▶ target some_target

Created on 2020-10-27 by the reprex package (v0.3.0)