Closed statquant closed 5 years ago
Hi,
Thank you for your interest in the package. It would be great if you could include a minimal example that reproduces the behaviour you see.
There is a unique task, when I use %do% it works but using %dopar% comes back with this error.
A lot of code might run fine with %do%
but not with %dopar%
, for instance when you accessing global objects that you forget to export. A better check would be to compare the clustermq
%dopar%
with a SOCKcluster
:
registerDoParallel(parallel::makePSOCKcluster(2))
# then run your function and see if you get the same error
Error in if (sum(nchar(x)) > breakAt) sep <- "\n"
This code is not part of clustermq
. Are you sure this is not an error in your function? (I can not check since you did not provide it)
Another question, say I send 5 task on slurm and I get an error on one worker, I expect to get back 4 results and some error object, am I correct ?
The default template uses array jobs, not tasks. I am not sure how requesting more than one task will behave on Slurm.
Hello @mschubert this was part of my code, sorry about this. With regards to array jobs as opposed to tasks I will test and revert back to you, I expect I just need to comment out
#SBATCH --array=1-{{ n_jobs }}
and replace it with
#SBATCH --ntasks=1
by the way I notticed that you do not indicate
#SBATCH --cpus-per-task={{ n_cpu_per_task }}
is that expected ?
--cpus-per-task
is specified in the latest default template: https://github.com/mschubert/clustermq/blob/master/inst/SLURM.tmpl
Ah great, I took it from https://mschubert.github.io/clustermq/articles/userguide.html that's why, will be back soon, thanks
I have now also updated this in the user guide
Hello thanks thank you for this nice package, I love the no-file approach. I am running a job of a slurm cluster using
clustermq
as a foreach backend. There is a unique task, when I use %do% it works but using %dopar% comes back with this error. Note that the log file is clean (as in no error on the worker), so I think the error is on the master.I also tried a toy example that works fine
foreach(task=1:10) %dopar% Sys.getpid()
Error on the R console (for a "real" job) is:
Another question, say I send 5 task on slurm and I get an error on one worker, I expect to get back 4 results and some error object, am I correct ?