mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
145 stars 26 forks source link

jobs won't run from R Jupyter notebooks in VS Code, for SGE #319

Closed atyakhtmpg closed 7 months ago

atyakhtmpg commented 8 months ago

I am attempting to use clustermq from an R notebook in a Visual Studio Code. The "multicore" mode works fine for the code. However, when I switched to "sge" with a template provided, the worker jobs apparently won't run and theit logs show:

R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-conda-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

clustermq:::worker("") code for methods in class “Rcpp_ZeroMQ_raw” was not checked for suspicious field assignments (recommended package ‘codetools’ not available?) code for methods in class “Rcpp_ZeroMQ_raw” was not checked for suspicious field assignments (recommended package ‘codetools’ not available?) 2023-11-08 18:44:13.528617 | Master: Error in private$zmq$connect(address, socket_type, sid) : Invalid argument Calls: -> -> -> .External Execution halted

Is clustermq overall supposed to work in the described scenario?

atyakhtmpg commented 8 months ago

Update: I managed to start a simple toy example with the setup described above and in a clean Conda environment, but the generated jobs on cluster nodes tend to hang, and tracing shows that they are apparently stuck attempting to allocate memory. The R Kernel in VS Code crashes subsequently.

Example:

clustermq_setup(scheduler = c("sge")) fx <- function(x) x * 2 Q(fx, x = 1:100, n_jobs = 5)

Tracing: strace -f -T -p {PID}:

... [pid 16831] mbind(0xffffffffffffffff, 134217728, MPOL_PREFERRED, NULL, 0, 0 <unfinished ...> [pid 16839] <... munmap resumed>) = 0 <0.000228> [pid 16838] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000227> [pid 16837] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000223> [pid 16836] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000218> [pid 16835] <... mmap resumed>) = 0x146859760000 <0.000214> [pid 16834] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000206> [pid 16833] mmap(NULL, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...> [pid 16832] mmap(NULL, 134225920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> [pid 16831] <... mbind resumed>) = -1 EINVAL (Invalid argument) <0.000204> [pid 16839] mmap(NULL, 134225920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> ...

luwidmer commented 8 months ago

@atyakhtmpg did you see that the new version 0.9.0 / 0.9.1 on CRAN has an issue fixed regarding allocating sufficient memory for R to start with SGE: https://cran.r-project.org/web/packages/clustermq/news/news.html (see also #298).

mschubert commented 7 months ago

I'll assume this is solved by adding the missing memory allocation in the submission template used.

Please reopen if that's not the case.