Image upload is a lot slower than it should be

david-crespo commented 6 months ago

My manual testing confirmed @taspelund's report that the CLI is at least 5x faster at uploading images than the web console. The CLI uses 8 threads instead of the 6 we're limited to in the browser, but obviously that alone can't explain this. Needs more investigation.

I was at least able to confirm that skipping empty chunks is working as expected on web, so that is not the problem. The individual requests seem to just take longer.

david-crespo commented 6 months ago

CLI chunks take about 600ms each while console POSTs seem to be taking about 2 seconds each.

david-crespo commented 6 months ago

Seeing some strange behavior when I compare doing one request at a time to multiple.

max 1 concurrency, 3s sleep in between calls

Pretty clearly bifurcated between ~900ms calls and ~2000ms calls with like 1/4 of them being the slow ones.

upload - 1 conc 3s apart

max 1 concurrency, no sleep

About the same.

upload - 1 conc 0s apart

max 3 concurrency

Now nearly all the requests take 2 seconds.

upload - 3 conc

wfchandler commented 3 weeks ago

@david-crespo I suspect this is due to the browser using HTTP/2, which is multiplexing a single TCP connection for all six of the requests we're running in parallel. The CLI is establishing 8 separate TCP connections, which are individually less efficient than the browser, but have much greater total throughput.

You can see in wireshark that each of the CLI connections has about half the throughput of the Console's, 6.5 Mbps vs 13 Mbps, but combined they can push ~52 Mbps.

(Ignore the packet counts and total amount written in the screenshots, I didn't run these for the same amount of time.)

CLI

Console

david-crespo commented 3 weeks ago

Yes, thank you! I just discovered this a few days ago as well, but did not have as good of a confirmation of what the CLI was doing. (Image on the right is going through a local proxy server that can't do http2, so it falls back to 1.1 and is faster).

I've tried looking at things we can change in the browser, but I wonder if there is something we can change on the server to give more bandwidth to that single TCP connection.

2024-09-13-http-1-vs-2

wfchandler commented 3 weeks ago

Let me caveat the following by noting that my understanding of TCP and network tunables in Illumos are limited.

I think the issue may be the relative small receive window being set by Nexus. For both CLI and Console, the maximum window I observe is ~128kb, which may be preventing a single connection from sending the same volume of in-flight traffic as the eight separate connections.

Here we can see the Console sending a burst of traffic, then waiting for the ACKs to come in before proceeding.

The CLI shows a similar pattern, but more extreme, presumably this is because we're dealing with a single logical stream.

According to some very old Solaris docs, tcp_recv_hiwat controls the default receive window size. Poking around in a Nexus zone, I see the following tcp settings:

root@oxz_nexus_65a11c18:~# ipadm show-prop tcp
PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
tcp   congestion_control    rw   sunreno      --           sunreno      sunreno,newreno,
                                                                        cubic
tcp   ecn                   rw   passive      --           passive      never,passive,
                                                                        active
tcp   extra_priv_ports      rw   2049,4045    --           2049,4045    1-65535
tcp   largest_anon_port     rw   65535        --           65535        1024-65535
tcp   max_buf               rw   1048576      --           1048576      8192-1073741824
tcp   recv_buf              rw   128000       --           128000       2048-1048576
tcp   sack                  rw   active       --           active       never,passive,
                                                                        active
tcp   send_buf              rw   49152        --           49152        4096-1048576
tcp   smallest_anon_port    rw   32768        --           32768        1024-65535
tcp   smallest_nonpriv_port rw   1024         --           1024         1024-32768

root@oxz_nexus_65a11c18:~# ndd -get /dev/tcp tcp_recv_hiwat
128000

Interestingly, RTT for the Console is ~2x shorter than for the CLI. My naive guess is that is due to the larger number of connections, but I have done nothing to validate this.

Average RTT for Console is 0.07 seconds:

$ tshark -Y "(ip.src[0]==45 && ip.src[1]==154) || (ip.dst[0]==45 && ip.dst[1]==154)" -r ~/console.pcapng.lz4 -Tfields -e "tcp.analysis.ack_rtt" | agrind '* | parse "*" as rtt | avg(rtt), max(rtt), min(rtt), p99(rtt)'
_average        _max        _min        p99
----------------------------------------------------
0.07            0.18        0.00        0.08

Average RTT for CLI is 0.13 seconds:

$ tshark -Y "(ip.src[0]==45 && ip.src[1]==154) || (ip.dst[0]==45 && ip.dst[1]==154)" -r ~/cli-2.pcapng.lz4 -Tfields -e "tcp.analysis.ack_rtt" | agrind '* | parse "*" as rtt | avg(rtt), max(rtt), min(rtt), p99(rtt)'
_average        _max        _min        p99
----------------------------------------------------
0.13            0.88        0.00        0.21

Maximum throughput in Mbps should be roughly $\frac{windowSizeInBits / RTTSeconds}{1000000}$

Theoretical max for Console: 14.63 Mbps Theoretical max for one CLI connection: 7.88 Mbps

These are a bit faster than what we see in practice, but we're not always at the maximum window size. Does this seem plausible to you?

david-crespo commented 3 weeks ago

Very interesting. I bet we can test that on dogfood by simply increasing the value of recv_buf.

Check this out: https://www.illumos.org/issues/15019

wfchandler commented 3 weeks ago

We temporarily increased the nexus zones' recv_buf on colo by 8x to 1,024,000 bytes with:

$ pilot host exec -c "zoneadm list | grep nexus | xargs -t -I% zlogin % ipadm set-prop -t -p recv_buf=1024000 tcp" 0-31
$ pilot host exec -c "zoneadm list | grep nexus | xargs -t -I% svcadm -z % restart nexus" 0-31

In my testing with Console, upload speeds went from 13 Mbps to 40 Mbps:

RTT increased a bit:

$ % tshark -Y "(ip.src[0]==45 && ip.src[1]==154) || (ip.dst[0]==45 && ip.dst[1]==154)" -r ~/console-large-buf.pcapng.lz4 -Tfields -e "tcp.analysis.ack_rtt" | agrind '* | parse "*" as rtt | avg(rtt), max(rtt), min(rtt), p99(rtt)'  
_average        _max        _min        p99
----------------------------------------------------
0.13            0.34        0.00        0.24

Peak window size increased to ~1 MB, as expected:

For the CLI, the throughput went down from 52 Mbps to 45 Mbps, but I suspect this may due to fluctuations in the maximum upload bandwidth available to my home internet connection.

RTT increased for the CLI as well:

$ tshark -Y "(ip.src[0]==45 && ip.src[1]==154) || (ip.dst[0]==45 && ip.dst[1]==154)" -r ~/cli-large-buf.pcapng.lz4 -Tfields -e "tcp.analysis.ack_rtt" | agrind '* | parse "*" as rtt | avg(rtt), max(rtt), min(rtt), p99(rtt)'
_average        _max        _min        p99
----------------------------------------------------
0.47            3.42        0.00        0.92

benjaminleonard commented 3 weeks ago

@wfchandler you are a magician, great work!

wfchandler commented 3 weeks ago

I've opened https://github.com/oxidecomputer/omicron/issues/6601 to track a fix from the control plane side.

david-crespo commented 3 weeks ago

I'm going to close this because there's no work to do in the console. Thanks!

oxidecomputer / console