mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

Clear error message for invalid templates #174

Closed wlandau closed 4 years ago

wlandau commented 4 years ago

Q() struggles to fill in the template when we supply a nested list as iterated data. On an SGE cluster:

.sge_template <- glue::glue(
"# From https://github.com/mschubert/clustermq/wiki/SGE
#$ -N {{ job_name }}               # job name
#$ -t 1-{{ n_jobs }}               # submit jobs as array
#$ -j y                            # combine stdout/error in one file
#$ -o {{ log_file | /dev/null }}   # output file
#$ -cwd                            # use pwd as work dir
#$ -V                              # use environment variable
#$ -pe smp 1                       # request 1 core per job
# module load {.r_version}         # censored
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker(\"{{ master }}\")'"
)

template_file <- tempfile()
writeLines(.sge_template, con = template_file)

options(
  clustermq.scheduler = "sge",
  clustermq.template = template_file
)

add_list_elements <- function(.xy_list) {
  purrr::reduce(.xy_list, sum) # development purrr
  .xy_list
}

super_list <- list(
  list(x = 1, y = 2, z = 3),
  list(x = 3, y = 5, z = 2),
  list(x = 1, y = 1, z = 54)
)

clustermq::Q(
  fun = add_list_elements, 
  .xy_list = super_list, 
  n_jobs = 3
)
#> Submitting 3 worker jobs (ID: 6998) ...
#> Error in unlist(values)[keys[upd]]: invalid subscript type 'list'

traceback()
#> 6: private$fill_template(opts)
#> 5: (function (...) 
#>    {
#>        opts = private$fill_options(...)
#>        private$job_id = opts$job_name
#>        filled = private$fill_template(opts)
#>        success = system("qsub", input = filled, ignore.stdout = TRUE)
#>        if (success != 0) {
#>            print(filled)
#>            stop("Job submission failed with error code ", success)
#>        }
#>    })(n_jobs = 3)
#> 4: do.call(qsys$submit_jobs, template)
#> 3: workers(n_jobs, data = data, reuse = FALSE, template = template, 
#>        log_worker = log_worker, verbose = verbose)
#> 2: Q_rows(fun = fun, df = df, const = const, export = export, pkgs = pkgs, 
#>        seed = seed, memory = memory, template = template, n_jobs = n_jobs, 
#>        job_size = job_size, rettype = rettype, fail_on_error = fail_on_error, 
#>        workers = workers, log_worker = log_worker, chunk_size = chunk_size, 
#>        timeout = timeout, max_calls_worker = max_calls_worker, verbose = verbose)
#> 1: clustermq::Q(fun = add_list_elements, .xy_list = super_list, 
#>        n_jobs = 3)
wlandau commented 4 years ago

Possible reason: sapply() returns an empty list for keys. Maybe vapply() would be safer. Not sure if this is the only thing that needs to be done internally.

https://github.com/mschubert/clustermq/blob/b1bbb49a6d38a33cf3fc70024ba2e242464248f0/R/qsys.r#L246

mschubert commented 4 years ago

I'm very confused by this issue:

  1. I can not reproduce this with the current CRAN version (default slurm template; works fine)
  2. Iterated data should be completely independent from template filling
mschubert commented 4 years ago

Ok, I see the issue: glue is replacing all double curly braces {{ ... }} by a single curly brace { ... }.

This is not recognized by the template fillter, because it needs double braces. Nothing to do with iterated data though?

However, the error message could be improved.

mschubert commented 4 years ago

Proposed change, after:

https://github.com/mschubert/clustermq/blob/b1bbb49a6d38a33cf3fc70024ba2e242464248f0/R/qsys.r#L241-L247

If there are not matches for the required keys, fail with a clean error message

dlowe-lilly commented 4 years ago

Great catch. Sorry for the confusion. When I changed my code to use glue correctly it works like a charm. Thanks!