jobs won't run from R Jupyter notebooks in VS Code, for SGE #319

atyakhtmpg commented 8 months ago

I am attempting to use clustermq from an R notebook in a Visual Studio Code. The "multicore" mode works fine for the code. However, when I switched to "sge" with a template provided, the worker jobs apparently won't run and theit logs show:

clustermq:::worker("") code for methods in class “Rcpp_ZeroMQ_raw” was not checked for suspicious field assignments (recommended package ‘codetools’ not available?) code for methods in class “Rcpp_ZeroMQ_raw” was not checked for suspicious field assignments (recommended package ‘codetools’ not available?) 2023-11-08 18:44:13.528617 | Master: Error in private$zmq$connect(address, socket_type, sid) : Invalid argument Calls: -> -> -> .External Execution halted

Is clustermq overall supposed to work in the described scenario?

atyakhtmpg commented 8 months ago

Update: I managed to start a simple toy example with the setup described above and in a clean Conda environment, but the generated jobs on cluster nodes tend to hang, and tracing shows that they are apparently stuck attempting to allocate memory. The R Kernel in VS Code crashes subsequently.


clustermq_setup(scheduler = c("sge")) fx <- function(x) x * 2 Q(fx, x = 1:100, n_jobs = 5)

Tracing: strace -f -T -p {PID}:

... [pid 16831] mbind(0xffffffffffffffff, 134217728, MPOL_PREFERRED, NULL, 0, 0 <unfinished ...> [pid 16839] <... munmap resumed>) = 0 <0.000228> [pid 16838] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000227> [pid 16837] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000223> [pid 16836] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000218> [pid 16835] <... mmap resumed>) = 0x146859760000 <0.000214> [pid 16834] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000206> [pid 16833] mmap(NULL, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...> [pid 16832] mmap(NULL, 134225920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> [pid 16831] <... mbind resumed>) = -1 EINVAL (Invalid argument) <0.000204> [pid 16839] mmap(NULL, 134225920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> ...

luwidmer commented 8 months ago

@atyakhtmpg did you see that the new version 0.9.0 / 0.9.1 on CRAN has an issue fixed regarding allocating sufficient memory for R to start with SGE: (see also #298).

mschubert commented 7 months ago

I'll assume this is solved by adding the missing memory allocation in the submission template used.

Please reopen if that's not the case.