Closed drejom closed 10 months ago
Ok, so downgrading {targets} to 1.2.2 solved things for now and I can run my analysis.
However, I see a number of changes in 1.3.0 which i suspect account for the error.
I can run the targets-minimal pipeline without issue, but when i include the following to run it on SLURM, I get errors.
nodename <- Sys.info()["nodename"]
singularity_exec <- glue::glue("cd {here::here()} \\
/{base_dir}/easy-build/software/singularity/3.7.0/bin/singularity exec \\
--env R_LIBS_USER=~/R/bioc-3.17 \\
--env R_LIBS_SITE=/{base_dir}/singularity/shared_cache/rbioc/rlibs/bioc-3.17 \\
-B /{base_dir}/singularity,/ref_genomes,/scratch \\
/{base_dir}/singularity/shared_cache/rbioc/vscode-rbioc_3.17.sif \\")
slurm <- crew.cluster::crew_controller_slurm(
host = nodename,
script_lines = singularity_exec)
tar_option_set(
controller = slurm,
resources = tar_resources(
crew = tar_resources_crew(seconds_timeout = 3)
)
)
targets::tar_make()
▶ dispatched target raw_data_file
▶ completed pipeline [6.776 seconds]
Error:
! Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
target NA error: 'errorValue' int 5 | Timed out
Last error traceback:
tryCatch(withCallingHandlers({ NULL saveRDS(do.call(do.call, c(readRDS("...
tryCatchList(expr, classes, parentenv, handlers)
tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), na...
doTryCatch(return(expr), name, parentenv, handler)
tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
tryCatchOne(expr, names, parentenv, handlers[[1L]])
doTryCatch(return(expr), name, parentenv, handler)
withCallingHandlers({ NULL saveRDS(do.call(do.call, c(readRDS("/tmp/Rtmp...
saveRDS(do.call(do.call, c(readRDS("/tmp/RtmpgCy87w/callr-fun-15f2fb7d6d...
do.call(do.call, c(readRDS("/tmp/RtmpgCy87w/callr-fun-15f2fb7d6d7032"), ...
(function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
(function (targets_function, targets_arguments, options, envir = NULL, s...
tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
tryCatchList(expr, classes, parentenv, handlers)
tryCatchOne(expr, names, parentenv, handlers[[1L]])
doTryCatch(return(expr), name, parentenv, handler)
withCallingHandlers(targets::tar_callr_inner_try(targets_function = targ...
targets::tar_callr_inner_try(targets_function = targets_function, target...
do.call(targets_function, targets_arguments)
(function (pipeline, path_store, names_quosure, shortcut, reporter, seco...
crew_init(pipeline = pipeline, meta = meta_init(path_store = path_store)...
self$run_crew()
self$iterate()
self$conclude_worker_task()
tar_assert_all_na(result$error, msg = paste("target", result$name, "erro...
tar_throw_validate(msg %|||% default)
tar_error(message = paste0(...), class = c("tar_condition_validate", "ta...
rlang::abort(message = message, class = class, call = tar_empty_envir)
signal_abort(cnd, .file)
If I remove the resources
section from tar_option_set()
:
resources = tar_resources(
crew = tar_resources_crew(seconds_timeout = 3)
)
I get no error, but the pipeline never progresses beyond dispatching the first target:
targets::tar_make()
▶ dispatched target raw_data_file
/
Apologies if I'm missing something obvious, but are you able to provide any insight?
Prework
crew.cluster
package itself and not a user error, known limitation, or issue from another package thatcrew.cluster
depends on.Description
After updating a slew of packages recently, my SLURM-enabled targets pipeline has stopped running, with errors about
seconds_timeout
. I have a rather elaborate script to setup cluster operations, but I think I've narrowed it down to crew_controller_slurm(), so only post that here:Reproducible example
Apologies @wlandau my initial example was not in fact reproducible, but while I see if I can make a minimal example, does this {targets} error give any clues as to what's going on? It occurs with or without
seconds_timeout
set increw.cluster::crew_controller_slurm()
Diagnostic information