mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

Question about the R session on the worker #276

Closed ImNotaGit closed 2 years ago

ImNotaGit commented 2 years ago

I have got clustermq set up with an SGE template and paralleled computation with the Q function works well. I see multiple jobs being successfully submitted, and then the jobs exited with results returned properly after the computation is finished. I do not have a clear understanding of how it works internally, but for certain use cases I'm trying to get things more interactive, i.e. to get a R session constantly running on a persistent worker until I manually kill it, and to interactively send objects to this remote R session, perform computation, and receive objects from it. For this, I tried the following:

w <- workers(n_jobs=1, template=list(P="short", mem=16)) # start a single worker on a remote HPC computation node, based on my customized template
msg <- w$receive_data()
msg$id # "WORKER_READY"

Then I explored using worker$send_call to run codes interactively, which works fine within each call, e.g. a toy example:

n <- 10
w$send_call({
  library(data.table)
  set.seed(1)
  a <- rnorm(n)
  b <- data.table(a)
}, env=list(n=n))
msg <- w$receive_data()
msg$result # results returned correctly -- the value of `b` is returned

However, this is something I missed -- I imagined that there is a persistent R session running on the worker, but it seems that this is not the case, as when I subsequently tried to retrieve either a or b or n, these are not found, and ls() returned character(0), e.g.:

w$send_call(a)
msg <- w$receive_data()
msg$result # "Error in eval(msg$expr, envir = msg$env) : object 'a' not found\n"

Is this because each call actually starts a new independent R session which exits when the call completes? Is there a workaround to achieve my aim of getting a fully interactive and persistent R session on a remote worker?

mschubert commented 2 years ago

Can you explain a bit what the underlying problem is you are trying to solve?

If you want to have a remote-controlled worker on an HPC system, maybe something like rmote is a better fit than clustermq. I did not intend w$send_call() to be suitable for interactive work, but rather developing packages that want to interact with workers (because the logic is a bit complicated - and the objects generated in a call will not persist in the worker environment).

If you want to keep your workers running while sending multiple batches of work, you can do the following:

w = workers(...)
Q(function(x) x*2, x=1:5, workers=w)
Q(function(y) y+3, y=1:5, workers=w)
w$cleanup()
ImNotaGit commented 2 years ago

Thanks for the prompt reply. Yeah I also realized that a remotely controlled interactive R session is not what clustermq is mainly intended for -- I was just curious whether that's easily doable within clustermq, since if so it could be quite convenient without me having to deal with IP addresses/ports etc. explicitly. I have tried both the rmote and the remoter packages before, and while those are very useful they did not perfectly satisfy my need. But anyway if no workaround within clustermq exists I think this issue can be closed.

Can you explain a bit what the underlying problem is you are trying to solve?

Well, briefly, my institution has RStudio Server hosting on a remote machine with shared and very limited memory and CPUs. I wanted to find a way to use an R session on another HPC node (so that I can request for the exact resources I need), but still retain an interactive workflow within RStudio (like being able to send codes and receive small result objects, e.g. ggplot2 objects for visualization and R notebook knitting).

Edit: I have since looked at the remoter package again and luckily with some tweaking I've got it to work for my use case. Combining it with clustermq completely solved my issue.