While reading over some of the clustermq documentation again I am reminded of some confusion I had early on when using clustermq to ssh into my HPC. It seems to me that ssh isn't really a scheduler, but rather a means to connect to the scheduler/resource you want to use. So I think it can be confusing to list it alongside the schedulers, when you actually need to choose SSH + a scheduler. This is really just semantics but I wonder if it would make things clearer to separate ssh from the schedulers (see here).
For example, the docs show this as a way to run multiprocess on a remote machine via ssh:
# REMOTE
options(
clustermq.scheduler = "multiprocess" # or multicore, LSF, SGE, Slurm etc.
)
# LOCAL
options(
clustermq.scheduler = "ssh",
clustermq.ssh.host = "user@host", # use your user and host, obviously
clustermq.ssh.log = "~/cmq_ssh.log" # log for easier debugging
)
It is sort of confusing that you specify two "different" schedulers, when the actual scheduler is multiprocess, you just happen to be connecting via ssh. It might help users' mental model and clean up the API a little by separating things out into something like:
# LOCAL
options(
clustermq.connection = "ssh",
clustermq.ssh.host = "user@host"
)
Where the default connection is "local" or something.
Hi @mschubert,
While reading over some of the
clustermq
documentation again I am reminded of some confusion I had early on when usingclustermq
to ssh into my HPC. It seems to me thatssh
isn't really a scheduler, but rather a means to connect to the scheduler/resource you want to use. So I think it can be confusing to list it alongside the schedulers, when you actually need to choose SSH + a scheduler. This is really just semantics but I wonder if it would make things clearer to separatessh
from the schedulers (see here).For example, the docs show this as a way to run
multiprocess
on a remote machine viassh
:It is sort of confusing that you specify two "different" schedulers, when the actual scheduler is
multiprocess
, you just happen to be connecting via ssh. It might help users' mental model and clean up the API a little by separating things out into something like:Where the default connection is
"local"
or something.What do you think?