oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
244 stars 38 forks source link

Enlarge TCP recv_buf to improve throughput #6690

Open wfchandler opened 3 days ago

wfchandler commented 3 days ago

Image uploads performed via the web console are 3-4x slower than uploads performed via the Oxide CLI. We found that the CLI creates 8 separate TCP connections to upload the image chunks, while the console uses HTTP/2 to multiplex a single TCP connection six ways. The default TCP recv_buf size on Helios is 128 KB, which limits window size and therefore the number of packets that can be sent in parallel. By increasing this value to 1 MB, we can increase single-connection throughput by ~3x, bringing console performance to rough parity with the CLI.

This does increase the amount of memory a potential DoS attack could consume, but 1 MB is still quite small relative to the total resources available on a compute sled.

While we're at it, also update the TCP congestion control algorithm to cubic from its default value of sunreno, which may also help improve throughput.

Closes https://github.com/oxidecomputer/omicron/issues/6601