rapier1 / hpn-ssh

HPN-SSH based on OpenSSH
https://psc.edu/hpn-ssh-home
Other
302 stars 41 forks source link

HPNSSH / OpenSSH bad window interaction #58

Closed rowlap closed 3 months ago

rowlap commented 7 months ago

When streaming data over a 167ms RTT 10gps connection, throughput is highly variable. My network is tuned to allow TCP receive window to autoscale up to 100MB, which is achievable and verified with iperf, stable over 1 minute.

I only have one high-bandwidth WAN to play with, so YMMV. If repro is difficult, please ask and I can run different tests.

Steps to reproduce

Baseline openssh client to openssh server

server-asia$ { ssh -o LogLevel=DEBUG2 -E DEBUG2.log server-europe 'cat /dev/zero' | pv -frtb > /dev/null; } |& tr '\r' '\n'
0.00 B 0:00:01 [0.00 B/s]
0.00 B 0:00:02 [0.00 B/s]
 976KiB 0:00:03 [1.01MiB/s]
11.6MiB 0:00:04 [10.6MiB/s]
23.4MiB 0:00:05 [11.8MiB/s]
35.2MiB 0:00:06 [11.8MiB/s]
46.8MiB 0:00:07 [11.8MiB/s]
58.6MiB 0:00:08 [11.8MiB/s]
...
2.50GiB 0:03:40 [11.8MiB/s]
2.51GiB 0:03:41 [11.8MiB/s]
2.52GiB 0:03:42 [11.8MiB/s]
2.53GiB 0:03:43 [11.8MiB/s]

It just sticks at the same 11.8 MiB/s consistently, constrained by openssh's 1982464-byte channel window limit.

hpnssh client to openssh server

server-asia$ hpnssh -V
OpenSSH_9.5p1, OpenSSL 1.1.1k  FIPS 25 Mar 2021

server-asia$ { hpnssh -o LogLevel=DEBUG2 -E DEBUG2.log server-europe 'cat /dev/zero' | pv -frtb > /dev/null; } |& tr '\r' '\n'
0.00 B 0:00:01 [0.00 B/s]
0.00 B 0:00:02 [0.00 B/s]
 976KiB 0:00:03 [1.01MiB/s]
52.1MiB 0:00:04 [51.8MiB/s]
 168MiB 0:00:05 [ 115MiB/s]
 219MiB 0:00:06 [50.9MiB/s]
 289MiB 0:00:07 [70.1MiB/s]
 516MiB 0:00:08 [ 226MiB/s]
 742MiB 0:00:09 [ 226MiB/s]
 969MiB 0:00:10 [ 226MiB/s]
...
10.0GiB 0:00:51 [ 227MiB/s]
10.3GiB 0:00:52 [ 227MiB/s]
10.4GiB 0:00:53 [ 164MiB/s]
10.4GiB 0:00:54 [3.54MiB/s]
10.4GiB 0:00:55 [2.05MiB/s]
10.4GiB 0:00:56 [1.49MiB/s]
10.4GiB 0:00:57 [1.26MiB/s]
10.4GiB 0:00:58 [1.12MiB/s]
10.4GiB 0:00:59 [1.01MiB/s]
10.4GiB 0:01:00 [ 952KiB/s]
10.4GiB 0:01:01 [ 883KiB/s]
10.4GiB 0:01:02 [ 832KiB/s]
10.4GiB 0:01:03 [ 783KiB/s]
10.4GiB 0:01:04 [ 736KiB/s]
10.4GiB 0:01:05 [ 648KiB/s]
10.4GiB 0:01:06 [ 683KiB/s]
10.4GiB 0:01:07 [ 655KiB/s]
10.4GiB 0:01:08 [ 627KiB/s]
10.4GiB 0:01:09 [ 661KiB/s]
10.4GiB 0:01:10 [ 538KiB/s]
10.4GiB 0:01:11 [ 622KiB/s]
10.4GiB 0:01:12 [ 505KiB/s]
10.4GiB 0:01:13 [ 539KiB/s]
10.4GiB 0:01:14 [ 521KiB/s]
10.4GiB 0:01:15 [ 512KiB/s]
10.4GiB 0:01:16 [ 504KiB/s]
10.4GiB 0:01:17 [ 535KiB/s]
10.4GiB 0:01:18 [ 446KiB/s]
10.4GiB 0:01:19 [ 513KiB/s]

The sluggish performance in the hpnssh case is coincident with a reduction in the channel window, e.g.

debug2: channel 0: window 112178659 sent adjust 131072
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 112178659 sent adjust 131072
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 112178659 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 111719907 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 110671331 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 110474723 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 110278115 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 110081507 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 109884899 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 109688291 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 109491683 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 109295075 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 109098467 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 108901859 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600
debug2: channel 0: window 108705251 sent adjust 65536
debug2: tcpwinsz: tcp connection 4, Receive window: 104857600

There doesn't appear to be a specific trigger, but when in the pathological case, hpnssh will gradually reduce the channel window to near-zero.

rapier1 commented 7 months ago

That's... odd. I've seen things like this happen when the IO subsystem (usually disk) is having problems but not with /dev/zero to /dev/null. I don't have access to a path with that sort of RTT but using netem I was able to insert 167ms of delay between two 10Gb connected hosts. I was not able to recreate what you are seeing. Not saying you aren't experiencing it or that it isn't a problem with HPN-SSH. I just can't recreate it.

I did notice a couple of differences though. When I do hpnssh -V I get OpenSSH_9.5p1-hpn18.2.0 Ubuntu-1, OpenSSL 3.0.2 15 Mar 2022

The part that I'm confused about is that you don't have the -hpn18.2.0 suffix in your version string. I don't know how that happened as some form of the -hpn suffix is on every release I've made. How did you get, build, etc HPN-SSH?

Do you have any opportunity to use OpenSSL3? If you can't that's cool but there are a number of performance enhancements that make use of OSSL3. I'm wondering if you are hitting a code path I am normally not seeing. I'll recompile my version with OSSL1.1 and try that out.

Also, 100MB on your receive window is undersized. It should be closer to 200MB. You may want to try this with a 256MB buffer and increasing net.core.rmem_max to that size as well.

Lastly, can you try running a 120 second long iperf3 test between the two hosts? I'd like to ensure that it's not a problem with the intervening routers and switches.

rapier1 commented 7 months ago

I tried it with OSSL1.1 and setting my receive buffer to 100MB (100 1024 1024) and still haven't been able to recreate the problem after pushing 100GB of data. HPN-SSH might be entering some pathological state due to network conditions which the iperf3 test might (or might not) reveal. Let me know what you find out.

rowlap commented 6 months ago

I tried running iperf for 2m30s, but saw no degradation of throughput, which feels about right as SSH channel window isn't involved.

With apologies, I need to rebuild and retest as the current binary is coming from upstream FreeBSD ports, which appears to be forked from hpn14v5 in April 2015.

rapier1 commented 5 months ago

Hey, could you give this a test with the current release? We made some changes. This should only impact openssh sending data to an hpn installation but I'm wondering if it might help with what you've been seeing.

rowlap commented 5 months ago

Hi @rapier1, I'll try to download or build something as close to 18.3.0 as possible

rowlap commented 3 months ago

This appears to be resolved in the latest version.

$ ./hpnssh-18.3.1 -V
OpenSSH_9.6p1-hpn18.3.1, OpenSSL 3.0.13 30 Jan 2024

is now able to maintain the expected high window size and throughput. I now see no degradation of data rate, and the DEBUG2 reported window sizes correspond with the net.ipv4.tcp_rmem maximum value. I appreciate this could be tuned higher but there are traffic shaping controls in place to limit the "damage" individual hosts can inflict on the network.

FWIW the problems compiling this binary are around the conflicts between hpn-ssh and Redhat/Fedora ecosystem patches. In my case the lack of GSSAPI support in openssh-portable is a problem, and it was trial-and-error to drop certain patches until everything built together, but I appreciate it's tough to keep things working with all the various modified codebases out there.

rowlap commented 3 months ago

@rapier1 as the problem can no longer be reproduced with the latest version, please feel free to close this bug unless there's any more data which would be useful?

rapier1 commented 3 months ago

Sorry, I meant to close it but I've been swamped. New release coming out soon.