quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.76k stars 380 forks source link

Ensure quinn works well with jumbo frames #1457

Closed rklaehn closed 1 year ago

rklaehn commented 1 year ago

I did some benchmarking with a quinn based RPC framework. https://github.com/n0-computer/quic-rpc .

I found that unsurprisingly, the packet size has a huge influence on throughput. Here are benchmarks on a linux box with different values for initial_max_udp_payload_size:

The default:

$ ./target/release/bulk --initial-mtu 1200

Client 0 stats:
Overall download stats:

Transferred 1073741824 bytes on 1 streams in 4.48s (228.59 MiB/s)

Stream download metrics:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │  228.69 MiB/s │     4.48s
 P0   │  228.62 MiB/s │     4.48s
 P10  │  228.75 MiB/s │     4.48s
 P50  │  228.75 MiB/s │     4.48s
 P90  │  228.75 MiB/s │     4.48s
 P100 │  228.75 MiB/s │     4.48s

Largest value that was working for me:

$ ./target/release/bulk --initial-mtu 6000

Client 0 stats:
Overall download stats:

Transferred 1073741824 bytes on 1 streams in 2.18s (468.73 MiB/s)

Stream download metrics:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │  468.88 MiB/s │     2.18s
 P0   │  468.75 MiB/s │     2.18s
 P10  │  469.00 MiB/s │     2.18s
 P50  │  469.00 MiB/s │     2.18s
 P90  │  469.00 MiB/s │     2.18s
 P100 │  469.00 MiB/s │     2.18s

I found that with a sufficiently large packet size quinn bulk can outperform TCP with default settings, but with the default quinn settings it is slower than TCP (about 2/3).

So it would be nice to ensure that quinn works well in an environment that allows large frames, e.g. a LAN with jumbo frames enabled, or a loopback device with a large MTU.

I think a good way to do this would be to implement a dummy in memory AbstractUdpSocket transport that has basically zero overhead, then make sure quinn works well with large values of initial-mtu up to 65536, or at least up to the 9000 bytes of jumbo frames.

Ralith commented 1 year ago

In theory this should be doable over kernel loopback, but there's an unexplained effect leading to increased packet loss once we get past 6000 or so. Tangentially related: #69

rklaehn commented 1 year ago

Did some more experiments. I found that setting max_udp_payload_size on the EndpoingConfig as well makes a huge difference on mac, while having almost no effect on linux. I assume that is because on linux GSO / GRO features.

In any case, here are some benchmarks on M1 Mac with both values adjusted:

9200, best value on OSX

Stream download metrics:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │ 1258.50 MiB/s │  813.00ms
 P0   │ 1258.00 MiB/s │  813.00ms
 P10  │ 1259.00 MiB/s │  813.00ms
 P50  │ 1259.00 MiB/s │  813.00ms
 P90  │ 1259.00 MiB/s │  813.00ms
 P100 │ 1259.00 MiB/s │  813.00ms

1200, default:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │  389.12 MiB/s │     2.63s
 P0   │  389.00 MiB/s │     2.63s
 P10  │  389.25 MiB/s │     2.63s
 P50  │  389.25 MiB/s │     2.63s
 P90  │  389.25 MiB/s │     2.63s
 P100 │  389.25 MiB/s │     2.63s

This is getting to the point where the encryption actually makes a noticeable difference. Here is the speed with disabling encryption (using quinn-noise and commenting out the encryption and decryption for keys):

Stream download metrics:

      │  Throughput   │ Duration 
──────┼───────────────┼──────────
 AVG  │ 2287.00 MiB/s │  447.00ms
 P0   │ 2286.00 MiB/s │  447.00ms
 P10  │ 2288.00 MiB/s │  447.00ms
 P50  │ 2288.00 MiB/s │  447.00ms
 P90  │ 2288.00 MiB/s │  447.00ms
 P100 │ 2288.00 MiB/s │  447.00ms
Ralith commented 1 year ago

Since it seems clear that Quinn itself handles large UDP datagrams well, I'm going to close this as not actionable. Note that customizing the default max_udp_payload_size is required if your network supports larger values than the typical Ethernet MTU.