mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

error when running with targets #221

Closed liutiming closed 3 years ago

liutiming commented 3 years ago

I am not sure where to put this error message @wlandau @mschubert but I put it here first since I am running this in Sanger Institute's LSF Farm and perhaps @mschubert will be more familiar with LSF there?

Steps to reproduce:

  1. using targets-minimal
  2. add line options(clustermq.scheduler = "lsf", clustermq.template = "sge.tmpl") after line 12
  3. replace sge.tmpl with the following for running on Wellcome Sanger Institute farm
BSUB-J {{ job_name }}[1-{{ n_jobs }}]  # name of the job / array jobs
BSUB-n {{ cores | 1 }}                 # number of cores to use per job
BSUB-o {{ log_file | ~/clustermq_log/ }}      # stdout + stderr; %I for array index
BSUB-M {{ memory | 4096 }}             # Memory requirements in Mbytes
BSUB-R "span[hosts=1] select[mem>4096] rusage[mem=4096]"  # Memory requirements in Mbytes
BSUB-q normal                        # name of the queue (uncomment)
##BSUB-W {{ walltime | 6:00 }}          # walltime (uncomment)
BSUB-G team281

ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
  1. run tar_make_clustermq()
  2. error message:
> tar_make_clustermq()
● run target raw_data_file
● run target raw_data
\
Returning output by mail is not supported on this cluster.
Please use the -o option to write output to disk.
Request aborted by esub. Job not submitted.
[1] "BSUB-J cmq8253[1-1]  # name of the job / array jobs\nBSUB-n 1                 # number of cores to use per job\nBSUB-o ~/clustermq_log/      # stdout + stderr; %I for array index\nBSUB-M 4096             # Memory requirements in Mbytes\nBSUB-R \"span[hosts=1] select[mem>4096] rusage[mem=4096]\"  # Memory requirements in Mbytes\nBSUB-q normal                        # name of the queue (uncomment)\n##BSUB-W 6:00          # walltime (uncomment)\nBSUB-G team281\n\nulimit -v $(( 1024 * 4096 ))\nCMQ_AUTH=jhlte R --no-save --no-restore -e 'clustermq:::worker(\"tcp://farm5-head2:8253\")'\n"
Error in self$run_clustermq() : attempt to apply non-function
In addition: Warning message:
Strategy 'multiprocess' is deprecated in future (>= 1.20.0). Instead, explicitly specify either 'multisession' or 'multicore'. In the current R session, 'multiprocess' equals 'multicore'. 
Error: callr subprocess failed: attempt to apply non-function
Type .Last.error.trace to see where the error occured
> .Last.error.trace

 Stack trace:

 Process 10337:
 1. targets:::tar_make_clustermq()
 2. targets:::callr_outer(targets_function = tar_make_clustermq_inner,  ...
 3. targets:::trn(is.null(callr_function), callr_inner(target_script_path(),  ...
 4. base:::do.call(callr_function, prepare_callr_arguments(callr_function,  ...
 5. (function (func, args = list(), libpath = .libPaths(), repos = default_repo ...
 6. callr:::get_result(output = out, options)
 7. throw(newerr, parent = remerr[[2]])

 x callr subprocess failed: attempt to apply non-function 

 Process 20117:
 19. (function (targets_script, targets_function, targets_arguments)  ...
 20. base:::do.call(targets_function, targets_arguments)
 21. (function (pipeline, names_quosure, reporter, workers, log_worker)  ...
 22. clustermq_init(pipeline = pipeline, names = names, queue = "parallel",  ...
 23. self$run_clustermq()
 24. base:::.handleSimpleError(function (e)  ...
 25. h(simpleError(msg, call))

 x attempt to apply non-function 
wlandau commented 3 years ago

Can you reproduce the error with just clustermq?

options(clustermq.scheduler = "lsf", clustermq.template = "your_template.tmpl")
library(clustermq)
f <- function(x) x * 2
Q(f, x = seq_len(2), n_jobs = 1)
liutiming commented 3 years ago

thanks a lot @wlandau! i think the issue might be clustermq config then?

Submitting 1 worker jobs (ID: cmq8734) ...

Returning output by mail is not supported on this cluster.
Please use the -o option to write output to disk.
Request aborted by esub. Job not submitted.
[1] "BSUB-J cmq8734[1-1]  # name of the job / array jobs\nBSUB-n 1                 # number of cores to use per job\nBSUB-o ~/clustermq_log/1.log      # stdout + stderr; %I for array index\nBSUB-M 4096             # Memory requirements in Mbytes\nBSUB-R \"span[hosts=1] select[mem>4096] rusage[mem=4096]\"  # Memory requirements in Mbytes\nBSUB-q normal                        # name of the queue (uncomment)\n##BSUB-W 6:00          # walltime (uncomment)\nBSUB-G team281\n\nulimit -v $(( 1024 * 4096 ))\nCMQ_AUTH=nswxv R --no-save --no-restore -e 'clustermq:::worker(\"tcp://farm5-head2:8734\")'\n"
Error in (function (n_jobs, ..., log_worker = FALSE, verbose = TRUE)  : 
  Job submission failed with error code 255
liutiming commented 3 years ago

I did specify BSUB-o {{ log_file | ~/clustermq_log/ }} in the tmpl file, as explained above so I am not too sure why

These are the only four lines that produce the error above:

options(clustermq.scheduler = "lsf", clustermq.template = "sge.tmpl")
library(clustermq)
f <- function(x) x * 2
Q(f, x = seq_len(2), n_jobs = 1)
mschubert commented 3 years ago

I think your

BSUB-o {{ log_file | ~/clustermq_log/ }}

Is not valid because you're (1) referencing a directory instead of a file, and (2) I'm not sure if LSF expands ~ correctly

And aren't there # missing at the beginning of the line?

liutiming commented 3 years ago

Thank you @mschubert ! by adding back the # and changing the log path to an absolute path, the job was sent beautifully.

#BSUB-J {{ job_name }}[1-{{ n_jobs }}]  # name of the job / array jobs
#BSUB-n {{ cores | 1 }}                 # number of cores to use per job
#BSUB-o {{ log_file |  /nfs/users/nfs_t/tl11/analysis/w3/report/target_multicore/targets-minimal_null}}      # stdout + stderr; %I for array index
#BSUB-M {{ memory | 4096 }}             # Memory requirements in Mbytes
#BSUB-R "span[hosts=1] select[mem>4096] rusage[mem=4096]"  # Memory requirements in Mbytes
#BSUB-q normal                        # name of the queue (uncomment)
##BSUB-W {{ walltime | 6:00 }}          # walltime (uncomment)
#BSUB-G team281

ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'

Could you give a bit more guidance on how to modify the template? I do not quite understand how the syntax with curly brackets works so if there are documentations on that (could not find in the pkgdown page), it will be great!

mattwarkentin commented 3 years ago

@liutiming The wildcard words inside the double curly-brackets will get "filled in" when clustermq submits the jobs using this template. The values coming after the vertical pipe (|) are fall-back default values.

Let's take #BSUB-M {{ memory | 4096 }} as an example, if you do not pass a memory value when calling clustermq (either directly or indirectly via targets), then it will use 4096MB as the memory for your jobs.

Here is how you would control memory via targets:

tar_option_set(
  resources = list(memory = 1000) # this will replace {{ memory }}
)

tar_make_clustermq()

With the issue you were having: BSUB-o {{ log_file | ~/clustermq_log/ }}, you can either specify the path to a file for logging using log_file, otherwise it'll will use the default. However, your default pointed to a directory, not a file.

Lastly, the convention for specifying LSF directives using #BSUB-o rather than just BSUB-o is a matter of how the submission script is parsed to find the job configuration options.

Hope this is helpful.

liutiming commented 3 years ago

@matthewstrasiotto immensely helpful. thank you!