mozilla / neqo

Neqo, the Mozilla Firefox implementation of QUIC in Rust
https://firefox-source-docs.mozilla.org/networking/http/http3.html
Apache License 2.0
1.85k stars 124 forks source link

perf(transport): don't pre-allocate mtu on max_datagram_size #2086

Closed mxinden closed 1 month ago

mxinden commented 2 months ago

neqo_transport::Connection::max_datagram_size creates an Encoder, writes a packet header and a packet number and determines how many bytes of the mtu are left.

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/connection/mod.rs#L3408-L3427

The Encoder only has to hold the packet header and the packet number. Yet it is initialized with Encoder::with_capacity(mtu), where mtu can be up to 65535 bytes.

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/connection/mod.rs#L3408

Note that PacketBuilder::short and PacketBuilder::long called by Self::build_packet_header read the Encoder::capacity through PacketBuilder::infer_limit. https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/packet/mod.rs#L152-L180

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/packet/mod.rs#L188-L225

But PacketBuilder::infer_limit falls back to 2048 if the capacity is below 64, which will be the case when using Encoder::default() instead of Encoder::with_capacity(mtu). 2048 should be plenty enough for the packet header and the packet number.

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/packet/mod.rs#L135-L141

This commit prevents the wasted allocation by using Encoder::default() instead of Encoder::with_capacity(mtu). The former is backed by an empty Vec.


Feel free to ignore if you don't think the reduction in memory allocation is worth the complexity in reasoning described above.

github-actions[bot] commented 2 months ago

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.35%. Comparing base (d513712) to head (c3e5999).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2086 +/- ## ======================================= Coverage 95.35% 95.35% ======================================= Files 112 112 Lines 36335 36335 ======================================= Hits 34648 34648 Misses 1687 1687 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 2 months ago

Benchmark results

Performance differences relative to d513712bc9d06e297af5cf317a9f6dd70e69a88b.

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [98.912 ns 99.221 ns 99.533 ns]
       change: [-0.6624% -0.2427% +0.2050%] (p = 0.28 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe
coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [116.71 ns 117.06 ns 117.44 ns]
       change: [-7.1034% -2.5302% +0.1590%] (p = 0.28 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild
  13 (13.00%) high severe
coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [116.60 ns 117.17 ns 117.81 ns]
       change: [-0.5054% -0.0210% +0.4710%] (p = 0.94 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  9 (9.00%) high severe
coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [97.471 ns 97.578 ns 97.698 ns]
       change: [-1.7988% -0.1792% +1.1365%] (p = 0.84 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.49 ms 111.61 ms 111.77 ms]
       change: [+0.2977% +0.4140% +0.5750%] (p = 0.00 < 0.05)

Found 13 outliers among 100 measurements (13.00%)
  11 (11.00%) low mild
  2 (2.00%) high severe
transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [26.516 ms 27.702 ms 28.914 ms]
       change: [-5.3154% +0.3008% +6.1247%] (p = 0.92 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [33.422 ms 35.001 ms 36.607 ms]
       change: [-12.155% -5.9755% +0.5737%] (p = 0.08 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
transfer/pacing-false/same-seed: No change in performance detected.
       time:   [26.032 ms 26.871 ms 27.705 ms]
       change: [-4.2351% -0.0324% +4.5431%] (p = 0.99 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
transfer/pacing-true/same-seed: No change in performance detected.
       time:   [41.561 ms 43.521 ms 45.509 ms]
       change: [-2.4773% +4.1611% +11.082%] (p = 0.22 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.
       time:   [113.00 ms 113.54 ms 114.16 ms]
       thrpt:  [875.97 MiB/s 880.77 MiB/s 884.94 MiB/s]
change:
       time:   [-1.0371% -0.3431% +0.3115%] (p = 0.34 > 0.05)
       thrpt:  [-0.3106% +0.3442% +1.0480%]

Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) low mild
  1 (1.00%) high severe
1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
       time:   [313.38 ms 317.07 ms 320.68 ms]
       thrpt:  [31.183 Kelem/s 31.538 Kelem/s 31.910 Kelem/s]
change:
       time:   [-2.0956% -0.4070% +1.2455%] (p = 0.63 > 0.05)
       thrpt:  [-1.2302% +0.4087% +2.1404%]
1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [33.693 ms 33.906 ms 34.133 ms]
       thrpt:  [29.297  elem/s 29.493  elem/s 29.680  elem/s]
change:
       time:   [-1.2479% -0.3761% +0.5256%] (p = 0.42 > 0.05)
       thrpt:  [-0.5228% +0.3775% +1.2637%]

Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback. Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 122.5 ± 54.3 90.2 338.7 1.00
neqo msquic reno on 223.4 ± 14.8 203.4 248.4 1.00
neqo msquic reno 218.2 ± 15.4 205.3 258.9 1.00
neqo msquic cubic on 220.8 ± 13.5 206.8 244.5 1.00
neqo msquic cubic 231.4 ± 39.8 201.0 321.5 1.00
msquic neqo reno on 133.3 ± 65.2 83.3 338.4 1.00
msquic neqo reno 95.9 ± 17.5 83.3 152.9 1.00
msquic neqo cubic on 147.3 ± 75.6 84.4 344.2 1.00
msquic neqo cubic 101.9 ± 15.4 83.3 138.1 1.00
neqo neqo reno on 220.5 ± 126.8 135.3 536.0 1.00
neqo neqo reno 161.8 ± 63.5 122.9 400.1 1.00
neqo neqo cubic on 195.0 ± 77.9 125.9 416.3 1.00
neqo neqo cubic 170.8 ± 32.6 126.4 236.4 1.00

:arrow_down: Download logs

github-actions[bot] commented 2 months ago

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.