mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

Authentication provided by worker does not match #127

Closed pat-s closed 5 years ago

pat-s commented 5 years ago

Getting this error recently and I have no clue where the problem might be. Executing from the master node.

Q(fx, x=1:3, n_jobs=1, template = list(n_cpus = 1, log_file = "log.txt"))

Submitting 1 worker jobs (ID: 6206) ...
Running 3 calculations (1 calls/chunk) ...
Error in qsys$receive_data(timeout = timeout) :
  Authentication provided by worker does not match

Template:

#!/bin/sh
#SBATCH --job-name={{ job_name }}
#SBATCH --partition=normal
#SBATCH --output={{ log_file | /dev/null }} # you can add .%a for array index
#SBATCH --error={{ log_file | /dev/null }}
#SBATCH --cpus-per-task={{ n_cpus }}
#SBATCH --array=1-{{ n_jobs }}

R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'

Worker log:

WARNING: ignoring environment value of R_HOME
  2
  3 R version 3.5.1 (2018-07-02) -- "Feather Spray"
  4 Copyright (C) 2018 The R Foundation for Statistical Computing
  5 Platform: x86_64-pc-linux-gnu (64-bit)
  6
  7 R is free software and comes with ABSOLUTELY NO WARRANTY.
  8 You are welcome to redistribute it under certain conditions.
  9 Type 'license()' or 'licence()' for distribution details.
 10
 11 R is a collaborative project with many contributors.
 12 Type 'contributors()' for more information and
 13 'citation()' on how to cite R or R packages in publications.
 14
 15 Type 'demo()' for some demos, 'help()' for on-line help, or
 16 'help.start()' for an HTML browser interface to help.
 17 Type 'q()' to quit R.
 18
 19 Warning message:
 20 package 'methods' was built under R version 3.5.2
 21 During startup - There were 12 warnings (use warnings() to see them)
 22 > clustermq:::worker("tcp://gisc:6206")
 23 Master: tcp://gisc:6206
 24 WORKER_UP to: tcp://gisc:6206
 25 slurmstepd: error: *** JOB 405 ON c0 CANCELLED AT 2019-02-28T11:56:35 ***
pat-s commented 5 years ago

This only happens when R is running in packrat mode. What might be clashing here? The only difference are the R pkg libs I think.

(Btw I had a working SSH setup before which worked with the packrat libs. So I'm a bit confused now).

mschubert commented 5 years ago

I added a simple socket (#125) authentication mechanism from version0.8.6. This is supposed to only be checked if the CMQ_AUTH variable is set in the template and ignored otherwise (with a warning that you are not using authentication).

In my tests this worked flawlessly with CMQ_AUTH set (no warning, but error if tokens do not match) and not set (warning but no error).

Can you try (1) making sure both your master and worker use the same version of clustermq and (2) changing your template line to the one below?

CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'

edit: Do you have any idea why the master doesn't show you this warning?

pat-s commented 5 years ago

Ah!

I remember getting the warning message. Then I inserted it as suggested and faced other errors. I did not realize that this error was caused by CMQ_AUTH and assumed other issues to be the reason. I did not even recognize the update as I installed the pkg on a new machine.

It works now - thanks!