mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

Is it confusing to call ssh a clustermq scheduler? #273

Closed mattwarkentin closed 1 year ago

mattwarkentin commented 3 years ago

Hi @mschubert,

While reading over some of the clustermq documentation again I am reminded of some confusion I had early on when using clustermq to ssh into my HPC. It seems to me that ssh isn't really a scheduler, but rather a means to connect to the scheduler/resource you want to use. So I think it can be confusing to list it alongside the schedulers, when you actually need to choose SSH + a scheduler. This is really just semantics but I wonder if it would make things clearer to separate ssh from the schedulers (see here).

For example, the docs show this as a way to run multiprocess on a remote machine via ssh:

# REMOTE
options(
    clustermq.scheduler = "multiprocess" # or multicore, LSF, SGE, Slurm etc.
)
# LOCAL
options(
    clustermq.scheduler = "ssh",
    clustermq.ssh.host = "user@host", # use your user and host, obviously
    clustermq.ssh.log = "~/cmq_ssh.log" # log for easier debugging
)

It is sort of confusing that you specify two "different" schedulers, when the actual scheduler is multiprocess, you just happen to be connecting via ssh. It might help users' mental model and clean up the API a little by separating things out into something like:

# LOCAL
options(
    clustermq.connection = "ssh",
    clustermq.ssh.host = "user@host"
)

Where the default connection is "local" or something.

What do you think?