mozilla / neqo

Neqo, the Mozilla Firefox implementation of QUIC in Rust
https://firefox-source-docs.mozilla.org/networking/http/http3.html
Apache License 2.0
1.85k stars 124 forks source link

perf: don't allocate in UDP recv path #2076

Closed mxinden closed 2 months ago

mxinden commented 2 months ago

Previously neqo-udp would have one long-lived receive buffer, but after reading into the buffer from the socket, it would allocate a new Vec for each UDP segment.

This commit does not allocate each UDP segment in a new Vec, but instead passes the single re-used receive buffer to neqo_transport::Connection::process_input directly.

Part of https://github.com/mozilla/neqo/issues/1693.


Draft for now. Want to see benchmark results before investing further.

github-actions[bot] commented 2 months ago

Benchmark results

Performance differences relative to 910a7cd8a87f4c5d052d15f0e10a7e8f1ad21446.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [98.743 ns 99.096 ns 99.452 ns]
       change: [-1.4056% -0.8996% -0.3504%] (p = 0.00 < 0.05)

Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe
coalesce_acked_from_zero 3+1 entries: :green_heart: Performance has improved.
       time:   [116.86 ns 117.16 ns 117.50 ns]
       change: [-2.6326% -1.6901% -1.0195%] (p = 0.00 < 0.05)

Found 19 outliers among 100 measurements (19.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  11 (11.00%) high severe
coalesce_acked_from_zero 10+1 entries: Change within noise threshold.
       time:   [116.40 ns 116.80 ns 117.30 ns]
       change: [-2.1473% -1.4459% -0.7655%] (p = 0.00 < 0.05)

Found 20 outliers among 100 measurements (20.00%)
  8 (8.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe
coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [97.168 ns 101.41 ns 110.93 ns]
       change: [-2.7203% +0.8236% +6.4501%] (p = 0.82 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
RxStreamOrderer::inbound_frame(): No change in performance detected.
       time:   [111.52 ms 111.66 ms 111.88 ms]
       change: [-0.2530% -0.1178% +0.0977%] (p = 0.20 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [25.985 ms 26.869 ms 27.751 ms]
       change: [-7.4673% -2.9356% +2.1097%] (p = 0.24 > 0.05)
transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [34.891 ms 36.585 ms 38.286 ms]
       change: [-4.9325% +1.5534% +8.5890%] (p = 0.64 > 0.05)
transfer/pacing-false/same-seed: No change in performance detected.
       time:   [31.107 ms 31.822 ms 32.519 ms]
       change: [-4.9157% -1.8751% +1.1009%] (p = 0.24 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
transfer/pacing-true/same-seed: No change in performance detected.
       time:   [39.949 ms 42.962 ms 45.965 ms]
       change: [-12.157% -4.0154% +5.2027%] (p = 0.39 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: :green_heart: Performance has improved.
       time:   [111.05 ms 111.36 ms 111.65 ms]
       thrpt:  [895.64 MiB/s 898.02 MiB/s 900.48 MiB/s]
change:
       time:   [-2.9276% -2.4889% -2.0409%] (p = 0.00 < 0.05)
       thrpt:  [+2.0835% +2.5525% +3.0159%]

Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
       time:   [310.54 ms 314.53 ms 318.58 ms]
       thrpt:  [31.390 Kelem/s 31.793 Kelem/s 32.202 Kelem/s]
change:
       time:   [-1.5604% +0.2356% +1.9649%] (p = 0.79 > 0.05)
       thrpt:  [-1.9270% -0.2350% +1.5852%]
1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [40.418 ms 41.136 ms 41.853 ms]
       thrpt:  [23.893  elem/s 24.310  elem/s 24.741  elem/s]
change:
       time:   [-2.2850% +0.0149% +2.5328%] (p = 1.00 > 0.05)
       thrpt:  [-2.4703% -0.0149% +2.3385%]

Client/server transfer results

Transfer of 33554432 bytes over loopback. Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 157.0 ± 109.5 81.6 484.1 1.00
neqo msquic reno on 207.1 ± 11.3 194.7 234.7 1.00
neqo msquic reno 204.4 ± 11.2 193.3 223.2 1.00
neqo msquic cubic on 209.7 ± 11.4 199.0 230.8 1.00
neqo msquic cubic 222.4 ± 14.2 201.2 245.9 1.00
msquic neqo reno on 87.7 ± 21.0 74.0 174.7 1.00
msquic neqo reno 85.8 ± 21.8 73.7 166.7 1.00
msquic neqo cubic on 83.8 ± 13.7 73.6 127.1 1.00
msquic neqo cubic 84.9 ± 22.0 74.3 182.1 1.00
neqo neqo reno on 169.0 ± 83.0 122.3 403.1 1.00
neqo neqo reno 147.4 ± 59.5 118.6 387.2 1.00
neqo neqo cubic on 148.6 ± 25.0 120.9 220.7 1.00
neqo neqo cubic 176.2 ± 75.2 116.5 394.2 1.00

:arrow_down: Download logs

github-actions[bot] commented 2 months ago

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

github-actions[bot] commented 2 months ago

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

mxinden commented 2 months ago

Thank you for taking a look @martinthomson.

Is it possible to change the code so that input datagrams are taken with a reference to the underlying buffer instead?

Yes. I created https://github.com/mozilla/neqo/pull/2093 implementing your suggestion above. It is a Draft for now, but I would argue goes beyond a proof-of-concept.

Closing here in favor of https://github.com/mozilla/neqo/pull/2093.