mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
145 stars 26 forks source link

cannot customize ports in ssh options #296

Closed weizhu365 closed 7 months ago

weizhu365 commented 1 year ago

Our hpc cluster has very limited ports to connect and I would like to pass the customized port setting via clustermq.defaults:

options(
  clustermq.scheduler = "ssh",
  clustermq.ssh.host = "xxx@xxx",
  clustermq.ssh.log = "~/cmq_ssh.log",  
  clustermq.defaults=list(ctl_port=34381,local_port=34381, job_port=45583, fwd_port=45583),
  clustermq.template = "~/.ssh/SSH.tmpl" 
)

I have tried various ways (even with hard-coded ports in SSH.tmpl), all failed. And I noticed that in your code: https://github.com/mschubert/clustermq/blob/master/R/qsys_ssh.r

    fill_options = function(ssh_host, ...) {
        values = utils::modifyList(private$defaults,
                                   list(ssh_host=ssh_host, ...))

        #TODO: let user define ports in private$defaults here and respect them
        remote = sample(50000:55000, 2)
        values$ssh_host = ssh_host
        bound = private$zmq$listen(sid="proxy")
        values$local_port = sub(".*:", "", bound)
        values$ctl_port = remote[1]
        values$job_port = remote[2]
        values$fwd_port = private$port
        values
    }

It seems that values via clustermq.defaults cannot passed to final call, and are overwritten in the function fill_options. And the feature I am looking for is still under development (in your TODO list).

I like your SSH proxy idea very much, and please let me now if there is a solution available.

Thanks and kind regards

Wei Zhu

mschubert commented 1 year ago

If I'm not mistaken, you are saying that two ports that are open on your HPC are 34381 and 45583 (and you are not restricted on local ports).

In this case, the following ssh.tmpl should work?

ssh -o "ExitOnForwardFailure yes" -f \
    -R 34381:localhost:{{ local_port }} \
    -R 45583:localhost:{{ fwd_port }} \
    {{ ssh_host }} \
    "R --no-save --no-restore -e \
        'clustermq:::ssh_proxy(ctl=34381, job=45583)' \
        > {{ ssh_log | /dev/null }} 2>&1"
weizhu365 commented 1 year ago

@mschubert Many thanks for your prompt reply. I have taken you suggestion and used the hard coded port numbers:

ssh  -v -o "ExitOnForwardFailure yes" -f \
    -R 34381:localhost:{{ local_port }} \
    -R 45583:localhost:{{ fwd_port }} \
    {{ ssh_host }} \
    "R --no-save --no-restore -e \
        'clustermq:::ssh_proxy(ctl=34381, job=45583)' \
        > {{ ssh_log | /dev/null }} 2>&1; echo '{{ ctl_port }} {{ job_port }} {{ local_port }} {{ fwd_port }}' >> {{ ssh_log }} "

I have to use {{ job_port }} {{ ctl_port }}, as they are required by the template.

According to the ssh log, the tunnel forward work fine and still failed at the end:

...
debug1: Remote connections from LOCALHOST:34381 forwarded to local address localhost:6279
debug1: Remote connections from LOCALHOST:45583 forwarded to local address localhost:6999
...
debug1: remote forward failure for: listen 34381, connect localhost:6279
Error: remote port forwarding failed for listen port 34381
Error in initialize(...) : 
  Remote R process did not respond after 5000 seconds. Check your SSH server log.

I have already install clustermq in the remote server.

Do I need start something at the remote server to establish the connection?

Nevertheless, it will be great if you could provide a detailed instruction about how to use ssh_proxy properly.

Thanks, Wei

weizhu365 commented 1 year ago

I found some old processed may block the ports. After I killed those processes, I got something different:

...
debug1: client_request_forwarded_tcpip: listen localhost port 45583, originator 127.0.0.1 port 60338
debug1: connect_next: host localhost ([127.0.0.1]:9842) in progress, fd=11
debug1: channel 3: new [127.0.0.1]
debug1: confirm forwarded-tcpip
debug1: channel 3: connected to localhost port 9842
debug1: client_input_channel_open: ctype forwarded-tcpip rchan 6 win 2097152 max 32768
debug1: client_request_forwarded_tcpip: listen localhost port 34381, originator 127.0.0.1 port 58298
debug1: connect_next: host localhost ([127.0.0.1]:8383) in progress, fd=12
debug1: channel 4: new [127.0.0.1]
debug1: confirm forwarded-tcpip
debug1: channel 4: connected to localhost port 8383
debug1: client_input_channel_req: channel 0 rtype keepalive@openssh.com reply 1
debug1: client_input_channel_req: channel 0 rtype keepalive@openssh.com reply 1
debug1: client_input_channel_req: channel 0 rtype keepalive@openssh.com reply 1
debug1: client_input_channel_req: channel 0 rtype keepalive@openssh.com reply 1
...

It seems connect established, but I keep receiving "debug1: client_input_channel_req: channel 0 rtype keepalive@openssh.com reply 1". And I cannot get any result of the simple submission as below:

fx = function(x) x * 2
Q(fx, x=1:3, n_jobs=1)

-Wei

weizhu365 commented 1 year ago

@mschubert It is working now after extending timeout for ssh.

Thank you any way,

-Wei