rapier1 / hpn-ssh

HPN-SSH based on OpenSSH
https://psc.edu/hpn-ssh-home
Other
302 stars 41 forks source link

Cannot adjust hpn_buffer_size for non-HPN connections #55

Closed rowlap closed 7 months ago

rowlap commented 9 months ago

Following the docs in HPN-README

If an HPN system connects to a nonHPN system the receive buffer will
be set to the HPNBufferSize value. The default is 2MB but user adjustable.

I tried to set HPNBufferSize to a non-default value.

$ bin/ssh -v $HOSTNAME uname |& grep -i hpn
debug1: Local version string SSH-2.0-OpenSSH_9.3-hpn14v15
debug1: Remote is NON-HPN aware
debug1: HPN to Non-HPN Connection
debug1: Final hpn_buffer_size = 2097152
debug1: HPN Disabled: 0, HPN Buffer Size: 2097152

$ bin/ssh -v -o HPNBufferSize=32 $HOSTNAME uname |& grep -i hpn
debug1: hpn_buffer_size set to 32768
debug1: Local version string SSH-2.0-OpenSSH_9.3-hpn14v15
debug1: Remote is NON-HPN aware
debug1: HPN to Non-HPN Connection
debug1: Final hpn_buffer_size = 2097152
debug1: HPN Disabled: 0, HPN Buffer Size: 2097152

$ bin/ssh -v -o HPNBufferSize=4096 $HOSTNAME uname |& grep -i hpn
debug1: hpn_buffer_size set to 4194304
debug1: Local version string SSH-2.0-OpenSSH_9.3-hpn14v15
debug1: Remote is NON-HPN aware
debug1: HPN to Non-HPN Connection
debug1: Final hpn_buffer_size = 2097152
debug1: HPN Disabled: 0, HPN Buffer Size: 2097152

The remote endpoint here is SSH-2.0-OpenSSH_8.0, i.e. portable openssh 8.0p1 running on an RHEL 8 clone.

In all cases, the configured value is replaced by the default 2MiB.

Is there any way to override HPNBufferSize, when connecting to non-HPN sshd?

rowlap commented 9 months ago

The code at ssh.c#L2162 reads as though "HPN to Non-HPN connection" bypasses all further HPNBufferSize option handling?

rapier1 commented 9 months ago

Let me look into that but it looks like you are right. So, from the way I'm thinking about things, if you specify a receive buffer via that option that should be the maximum receive buffer size regardless of the heterogeneity of the connection. Does that sound about right to you? Anyway, if the logic dictating this is off I'll fix it but it may be a while until it shows up in the master branch of the git repo. In the meantime I can provide a patch here. That work for you?

rapier1 commented 9 months ago

So looking at this more it seems that some of the settings don't really have much of a impact at all on the ssh receive buffer window growth - which is the original intent of these options.

For example

diff --git a/ssh.c b/ssh.c
index dfb8398a0..4d2092a73 100644
--- a/ssh.c
+++ b/ssh.c
@@ -2157,7 +2157,8 @@ hpn_options_init(struct ssh *ssh)
        if (tty_flag)
                options.hpn_buffer_size = CHAN_SES_WINDOW_DEFAULT;
        else
-               options.hpn_buffer_size = 2 * 1024 * 1024;
+               if (options.hpn_buffer_size <= 0)
+                       options.hpn_buffer_size = 2 * 1024 * 1024;

        if (ssh->compat & SSH_BUG_LARGEWINDOW) {
                debug("HPN to Non-HPN connection");

Fixes the problem of the hpn_buffer_size being clobbered but I'm still getting window growth well above that. For example ./hpnssh -p22 -o HPNBufferSize=32 -v myhost "dd if=/dev/zero bs=1M count=1000" > /dev/null reports

debug1: HPN to Non-HPN connection
debug1: Final hpn_buffer_size = 32768
debug1: HPN Disabled: 0, HPN Buffer Size: 32768

but I'm still seeing

debug1: Channel 0: Window growth to 87380 by 54612 bytes
debug1: Channel 0: Window growth to 931216 by 843836 bytes
debug1: Channel 0: Window growth to 6231984 by 5300768 bytes
...

So it's not effectively doing what I designed it to do initially. So this might take a little longer to dig into.

rapier1 commented 9 months ago

By the way, I'm assuming you are doing this to limit throughput. That's the main reason why that option is there.

rowlap commented 9 months ago

@rapier1 thanks for the prompt response. The high-level goal is to increase the channel window above 2MB, to achieve faster WAN performance.

(Why not use hpnssh as both endpoints? It's much simpler to change only the client in my environment.)

I don't know enough about channel window growth to suggest whether HPNBufferSize should be the default or the maximum, but would be happy as long as the window can grow to be that large. In my examples running uname I realise the amount of data is small.

rapier1 commented 9 months ago

Okay, I get that. At this point though the channel receive window isn't actually being impacted by that command line variable. It's actually doing nothing at this point. That's because there were some changes to the OpenSSH code base which rendered those command line options ineffective. I wasn't aware of that until now.

If you can do a transfer (in the direction of the hpnssh enabled endpoint) and enable debugging you should see some lines about "window growth" as in the above example. If those lines are indicating that the window is getting above 2MB then it means that the receive window buffer is exceeding that limit. Keep in mind, this only comes into play if the hpnssh endpoint is the one receiving the data. If you are sending the data to an OpenSSH endpoint the receive buffer doesn't come into play.

Also, it will also only matter if the bandwidth delay product of the path, which is a measure of how much data can be in flight at any one point on a path which, in turn, tells you how large your receive window should be, is greater than 2MB.

Anyway, I'm still going to work on fixing this so things do what I say they should do but take a moment to let me know if you are getting any window growth debug messages.

One way to test this is hpnssh -v yourremotehostrunningopenssh "dd if=/dev/zero bs=1M count=1000" > /dev/null That will have the remote endpoint send 1GB of data to your local hpnssh endpoint.

rowlap commented 9 months ago

After further testing, the situation isn't as bad as I'd first thought.

hpnscp copies (when receiving files) are fast, i.e. limited by the underlying TCP receive window available. Using -v shows the growth of the channel window.

At this point I'm not certain what's the effect of HPNBufferSize being unmanagable, but leave that to your good judgement.

One extra challenge to obtaining fast copies is that from OpenSSH 9.0 onwards, scp (the binary) has switched to using SFTP (the protocol) by default. As SFTP has its own buffer size, on top of SSH channel window, on top of TCP receive window, it's even more complicated to obtain fast copies. The scp -O flag helps by reverting to the prior behaviour.

rowlap commented 7 months ago

The 18.2.0 release notes mention that HPNBufferSize has been removed. Should this ticket be closed as no longer relevant?

rapier1 commented 7 months ago

Indeed. I completely forgot to close this out. Thanks!