mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
170 stars 51 forks source link

All jobs expire with ssh cluster #205

Closed riccardopinosio closed 5 years ago

riccardopinosio commented 5 years ago

Hello,

I have been trying the batchtools package but I am running in some difficulties. I am trying to spin up some workers on a remote Ubuntu 16.04 server from my local mac. First of all, I am able to start a cluster using the parallel package with a command of the type:

cl <- makePSOCKcluster(servername, user, rscript, homogeneous = FALSE) This works, and I can run computations on the remote ubuntu box. I tried to achieve the same using batchtools. I do:

reg <- makeRegistry() worker1 <- Worker$new(nodename = servername, ncpus = 2) cl <- makeClusterFunctionsSSH(list(worker1)) reg$cluster.functions <- cl This seems to terminate correctly (debugme also says it's all fine). Then I do:

ex_vec <- c(1,2,3) batchMap(function(x){x + 1}, ex_vec) submitJobs()

Now jobs seem to be submitted, but when I do getStatus() they all appear as expired, and I am not sure how to debug this. Could it be something to do with the fact that Rscript is not found on the remote workers? Is there an option to pass in the Rscript command?

Thanks!

mllg commented 5 years ago

Could it be something to do with the fact that Rscript is not found on the remote workers? Is there an option to pass in the Rscript command?

Rscript should be in the PATH on all remote machines. I'd suggest do adjust the path in your .bashrc or .bash_profile. If this does not work on your system, I can also make the path an argument so that you can overwrite this.

Another common pitfall is the passwordless login. Can you ssh [servername] without password? Is there any login message that may confuse batchtools?

riccardopinosio commented 5 years ago

Hi,

No, the password is not the issue: I made sure that ssh [servername] just works, using the .ssh/config file on mac. Also, if it was a password issue the parallel package should also fail.

Moreover, when I log onto the remote machine Rscript seems to be on the PATH, i.e. I get the help message if I run Rscript. Any further ideas about how to debug such an issue?

mllg commented 5 years ago

Any further ideas about how to debug such an issue?

Sorry, no. Remotely debugging stuff like this is almost impossible. :disappointed:

riccardopinosio commented 5 years ago

Fair enough :) I will try again and try to debug what's going on, maybe I'll get lucky.