Monomorphize public API

quinn-rs / quinn

Async-friendly QUIC implementation in Rust

Apache License 2.0

3.76k stars 380 forks source link

Monomorphize public API #1201

Closed Ralith closed 2 years ago

Ralith commented 2 years ago

Makes the assorted quinn_proto::crypto traits object-safe and replaces their uses with trait objects.

Fixes #911.

Ralith commented 2 years ago

Approximate build time impact for the bulk benchmark:

Clean	Incremental
Generic	83s	8.62s
Monomorphic	116s	4.12s

Incremental builds were triggered by touching build.rs with no actual changes. Doubling the speed there is pretty great. Interesting that clean build time got worse; I'm guessing that's because all the quinn and quinn-proto code is actually being compiled--and codegenned--rather than lazily instantiated on-demand by downstream, so we might expect the clean build slowdown to diminish for more applications that use more of the API than the bulk benchmark does.

Runtime performance looks unaffected either way; as predicted, the extra virtual calls are dwarfed by the real cost centers.

djc commented 2 years ago

Have you run the benchmarks to assess performance impact of the changes?

Ralith commented 2 years ago

There was no visible impact for the bulk benchmark, though someone with a better benchmarking setup (@Matthias247?) might want to have a look. We should probably have separate benchmarks for handshaking.

Ralith commented 2 years ago

This also saved about 3 out of 79 MiB in binary size for the bulk benchmark before stripping, or 0.3 out of 5.3 after.

Matthias247 commented 2 years ago

There was no visible impact for the bulk benchmark, though someone with a better benchmarking setup (@Matthias247?) might want to have a look. We should probably have separate benchmarks for handshaking.

Where would you expect to see an impact? In general data transmission benchmarks, due to different inlining properties? Or in handshakes, due to the use of trait objects there.

I think you might be able to test the latter by configuring lots of clients with 0 requests in some of the benchmarks. (not sure at the moment which one supports more than 1 client).

I wouldn't expect anything measureable however. Handshakes are already super expensive due to crypto, and the amount of dynamic calls is tiny.

Ralith commented 2 years ago

bulk --clients 200 --download-size 1:

Generic: 2.7s mean, 3.269s max
Monomorphic: 2.53s mean, 3.192s max

Variance was quite high and n=3, so I'm inclined to interpret this as "approximately the same". I agree that no significant impact is expected; we have increased the number of indirect calls, and the amount of boxing, but only modestly, and in contexts where more expensive work is already happening. In the event that any of this does show up in profiling we might be able to push some logic down through the erasure boundary, but I think we've got a lot of much lower hanging fruit.

Ralith commented 2 years ago

Incidentally, I'm happy to merge the remaining bits of https://github.com/quinn-rs/quinn/pull/1150 in first and rebase this myself, once they're ready.

Ralith commented 2 years ago

Rebased.