quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.85k stars 394 forks source link

Implement acknowledgement frequency #1543

Closed aochagavia closed 1 year ago

aochagavia commented 1 year ago

Note: this PR is no longer a draft, but I've left the text below untouched so people can trace how the PR was developed. You will probably want to jump to this comment for an overview of what this PR actually achieves.


This is a draft PR for now, because it is missing the following stuff:

As to the things that are properly implemented:

That being said, here are a few questions:

  1. What do you think of the general direction of the PR? Do you see anything that seems out of place?
  2. Is there a way to access unencrypted packet frames in tests that simulate a connection (quinn-proto/src/test/mod.rs)? It would be useful to ensure the ACK delay value in ACK frames is being properly set. Unit tests don’t give the same degree of confidence and have a higher maintenance cost. If this doesn't exist yet, would you welcome a commit adding it?
  3. I'm confused about the way out-of-order packets are detected in the ACK frequency draft, which seems different than in RFC 9000. The draft gives an example that, to my understanding, is wrong (which probably means my understanding is wrong). I posted a detailed question in the quicdev slack to get more clarity, so please have a look if you think you can help.
Ralith commented 1 year ago

Is there a way to access unencrypted packet frames in tests that simulate a connection (quinn-proto/src/test/mod.rs)?

Not presently. We've generally relied on making assertions about resulting connection state. You could do that here by looking at the RTT estimate, but that is a bit indirect. I don't see anything fundamentally wrong with asserting about the transmitted plaintext (we do something morally similar in the version negotiation tests, after all), but it's not immediately obvious to me how it might be done gracefully.

aochagavia commented 1 year ago

I don't see anything fundamentally wrong with asserting about the transmitted plaintext (we do something morally similar in the version negotiation tests, after all), but it's not immediately obvious to me how it might be done gracefully.

I just pushed a commit prototyping how this could work. It introduces a Connection::decode_packet function that returns the decoded bytes of the packet's payload, meant to be used from tests. It duplicates about 70 lines of code, though, and it requires deriving Clone for two structs (we could make the latter conditional for testing, if desired). It feels a bit awkward at the moment because of the code duplication, but with a bit of effort I might be able to extract that code into a dedicated module that can be shared. Do you think this is a good direction?

By the way, I have had no feedback on the implementer's Slack about detection of out-of-order packets in the ACK frequency draft, so my current plan is to go through the draft again next week and implement my own interpretation of it. I also plan to look around to see if there are open implementations of the current version of the draft that I can have a look.

Ralith commented 1 year ago

Do you think this is a good direction?

Seems reasonable in general (especially with #[cfg(test)]), but yeah factoring out the common logic would be nice. I have no problem with adding Clone to internal types; its absence generally isn't principled, we just haven't needed it before.

I have had no feedback on the implementer's Slack about detection of out-of-order packets in the ACK frequency draft

I saw; that was disappointing. Maybe open an issue (or PR with a correction) on https://github.com/quicwg/ack-frequency? The WG has been good about engaging on github historically. I definitely wouldn't be shocked by an error in an example.

aochagavia commented 1 year ago

I just pushed a first non-draft version of this PR, and modified the PR's base to the plpmtud-2 branch to get us a better diff.

Here is an overview of the feature:

I have a bunch of questions / comments:

  1. I introduced two timers, one to be notified when max_ack_delay is elapsed, and one to be notified when an RTT is elapsed. I tried to come up with an approach that wouldn't require additional timers, but I'm not sure it is possible. It would be nice if you can double-check that, given your knowledge of the codebase.
  2. I introduced a reset_rtt_timer function that arms the Timer::Rtt for the first time. I wanted to arm it right after we have reached the Data space and have an RTT estimate. I ended up calling it inside process_payload. Please let me know if there is a more suitable place.
  3. I rewrote a long comment inside PendingAcks::acks_sent, which I found difficult to understand. Hopefully the new version says the same in clearer terms (otherwise let me know and I'll update it)

Tomorrow I'll do some benchmarking to ensure everything works properly.

aochagavia commented 1 year ago

And... here are the benchmarks!

Practical details

I'm running the server using cargo run --release --bin perf_server -- --no-protection and the client using cargo run --release --bin perf_client -- --no-protection --duration 15 (note that this makes the benchmarks misleading for real-world use cases, because encryption is disabled). As you can deduce from the cargo commands used, I'm running both benchmarks on the same machine.

For the scenarios with ACK frequency enabled I'm doing so through the following code:

let mut ack_freq = AckFrequencyConfig::default();
ack_freq.ack_eliciting_threshold(10);
transport.ack_frequency_config(Some(ack_freq));

Conclusion

There seems to be no significant performance difference in the benchmarks, though we probably need a more statistically rigorous approach to benchmarking if we want to be sure.

Baseline (before this PR)

      │ Upload Duration │ Download Duration | FBL        | Upload Throughput | Download Throughput
──────┼─────────────────┼───────────────────┼────────────┼───────────────────┼────────────────────
 AVG  │        862.00µs │          811.00µs │    50.00µs │     1199.97 MiB/s │       1293.80 MiB/s
 P0   │        661.00µs │          534.00µs │     1.00µs │       32.34 MiB/s │         34.34 MiB/s
 P10  │        741.00µs │          678.00µs │     2.00µs │      986.50 MiB/s │       1089.00 MiB/s
 P50  │        810.00µs │          762.00µs │    47.00µs │     1235.00 MiB/s │       1313.00 MiB/s
 P90  │          1.01ms │          919.00µs │    79.00µs │     1350.00 MiB/s │       1475.00 MiB/s
 P100 │         30.91ms │           29.12ms │   470.00µs │     1513.00 MiB/s │       1873.00 MiB/s

ACK frequency disabled (threshold = 1)

      │ Upload Duration │ Download Duration | FBL        | Upload Throughput | Download Throughput
──────┼─────────────────┼───────────────────┼────────────┼───────────────────┼────────────────────
 AVG  │        920.00µs │          805.00µs │    53.00µs │     1190.95 MiB/s │       1280.93 MiB/s
 P0   │        670.00µs │          134.00µs │     1.00µs │       33.44 MiB/s │         34.12 MiB/s
 P10  │        746.00µs │          699.00µs │     4.00µs │      996.50 MiB/s │       1113.00 MiB/s
 P50  │        819.00µs │          774.00µs │    49.00µs │     1222.00 MiB/s │       1292.00 MiB/s
 P90  │          1.00ms │          899.00µs │    82.00µs │     1341.00 MiB/s │       1431.00 MiB/s
 P100 │         29.92ms │           29.30ms │   847.00µs │     1493.00 MiB/s │       7464.00 MiB/s

ACK frequency enabled on the client (threshold = 10)

      │ Upload Duration │ Download Duration | FBL        | Upload Throughput | Download Throughput
──────┼─────────────────┼───────────────────┼────────────┼───────────────────┼────────────────────
 AVG  │        866.00µs │          767.00µs │    48.00µs │     1187.26 MiB/s │       1479.18 MiB/s
 P0   │        686.00µs │           70.00µs │     1.00µs │       33.25 MiB/s │         32.25 MiB/s
 P10  │        753.00µs │          680.00µs │     3.00µs │     1020.50 MiB/s │       1198.00 MiB/s
 P50  │        821.00µs │          749.00µs │    46.00µs │     1219.00 MiB/s │       1336.00 MiB/s
 P90  │        980.00µs │          835.00µs │    77.00µs │     1329.00 MiB/s │       1471.00 MiB/s
 P100 │         30.08ms │           30.99ms │   421.00µs │     1458.00 MiB/s │      14288.00 MiB/s

ACK frequency enabled on the server (threshold = 10)

      │ Upload Duration │ Download Duration | FBL        | Upload Throughput | Download Throughput
──────┼─────────────────┼───────────────────┼────────────┼───────────────────┼────────────────────
 AVG  │        890.00µs │          844.00µs │    51.00µs │     1182.28 MiB/s │       1202.34 MiB/s
 P0   │        684.00µs │          122.00µs │     1.00µs │       34.59 MiB/s │        100.25 MiB/s
 P10  │        766.00µs │          753.00µs │     2.00µs │     1016.50 MiB/s │       1064.00 MiB/s
 P50  │        826.00µs │          828.00µs │    49.00µs │     1211.00 MiB/s │       1208.00 MiB/s
 P90  │        984.00µs │          940.00µs │    86.00µs │     1306.00 MiB/s │       1329.00 MiB/s
 P100 │         28.90ms │            9.98ms │   427.00µs │     1462.00 MiB/s │       8200.00 MiB/s
aochagavia commented 1 year ago

@djc it looks like you closed this PR by accident

aochagavia commented 1 year ago

For anywone following along, I just rewrote the above comment with the benchmark results, because it looks like my baseline was non-representative.

djc commented 1 year ago

Ahh, sorry -- I deleted the branch for #1552, which apparently makes it impossible to reopen this PR. Can you try repushing this branch so we can reopen it? We could open a new PR but would be better to keep this one going.

aochagavia commented 1 year ago

Ah yes, because the PR was targetting that branch... I can no longer edit the target branch of the PR, even after force-pushing a rebased version, so it looks like I'll have to open a new one.

aochagavia commented 1 year ago

And here is the new PR: #1553