Closed atyakhtmpg closed 7 months ago
Update: I managed to start a simple toy example with the setup described above and in a clean Conda environment, but the generated jobs on cluster nodes tend to hang, and tracing shows that they are apparently stuck attempting to allocate memory. The R Kernel in VS Code crashes subsequently.
Example:
clustermq_setup(scheduler = c("sge")) fx <- function(x) x * 2 Q(fx, x = 1:100, n_jobs = 5)
Tracing: strace -f -T -p {PID}:
... [pid 16831] mbind(0xffffffffffffffff, 134217728, MPOL_PREFERRED, NULL, 0, 0 <unfinished ...> [pid 16839] <... munmap resumed>) = 0 <0.000228> [pid 16838] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000227> [pid 16837] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000223> [pid 16836] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000218> [pid 16835] <... mmap resumed>) = 0x146859760000 <0.000214> [pid 16834] <... mmap resumed>) = -1 ENOMEM (Cannot allocate memory) <0.000206> [pid 16833] mmap(NULL, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...> [pid 16832] mmap(NULL, 134225920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> [pid 16831] <... mbind resumed>) = -1 EINVAL (Invalid argument) <0.000204> [pid 16839] mmap(NULL, 134225920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> ...
@atyakhtmpg did you see that the new version 0.9.0 / 0.9.1 on CRAN has an issue fixed regarding allocating sufficient memory for R to start with SGE: https://cran.r-project.org/web/packages/clustermq/news/news.html (see also #298).
I'll assume this is solved by adding the missing memory allocation in the submission template used.
Please reopen if that's not the case.
I am attempting to use clustermq from an R notebook in a Visual Studio Code. The "multicore" mode works fine for the code. However, when I switched to "sge" with a template provided, the worker jobs apparently won't run and theit logs show:
Is clustermq overall supposed to work in the described scenario?