mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
145 stars 26 forks source link

some progress with future.clustermq #282

Closed michaelmayer2 closed 2 years ago

michaelmayer2 commented 2 years ago

I have been playing around with future.clustermq lately and am getting into things. I now cam use it to launch more than one worker on Slurm which is great but now I am stuck at https://github.com/michaelmayer2/future.clustermq/blob/master/R/ClusterMQFuture-class.R#L233

workers$receive_data() reports token: "not set" after which it runs workers$send_common_data(). This eventually leads to success=NULL and the code stops.

I would be curious if there is any pointers on how to transfer the token to the workers. This at least would get me to a state where the workers are up and running to take some work.

Thanks in advance,

Michael.

michaelmayer2 commented 2 years ago

In a debugging session I can see Slurm jobs running

me@future.clustermq$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            5512_1       all  cmq6336       mm  R       0:04      1 all-st-rstudio-1
            5512_2       all  cmq6336       mm  R       0:04      1 all-st-rstudio-1

Also the workers report everything correctly except the token

Browse[1]> workers$receive_data()
$id
[1] "WORKER_READY"

$auth
[1] "tlfzv"

$pkgver
[1] ‘0.8.95.3’

$token
[1] "not set"
michaelmayer2 commented 2 years ago

Looks like I hit the wrong github repo - opening another issue at https://github.com/HenrikBengtsson/future.clustermq/issues/3