quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.76k stars 380 forks source link

Intra-endpoint horizontal scaling #1576

Open Ralith opened 1 year ago

Ralith commented 1 year ago

Quinn, like any flexible QUIC implementation, supports horizontal scaling across multiple endpoints through the use of custom connection ID generators and QUIC-aware load-balancing front-ends. However, the requirement for a third party load balancer and custom logic makes this difficult to leverage. For the overwhelmingly common case of applications that fit comfortably on a single host, Quinn should allow a single endpoint to scale the number of concurrent near-linearly with respect to available CPU cores.

In the current architecture, separate connection drivers already allow significant cryptographic work, and all application-layer work, for independent connections to happen in parallel. A bottleneck remains at the endpoint driver, an async task responsible for driving all network I/O and timers. We can do better.

The Linux SO_REUSEPORT option will distribute incoming packets on a single port across multiple sockets. Packets are routed to sockets based on a 4-tuple hash, so we can rely on connections moving between drivers only in the event of migrations, minimizing contention outside of connection setup/teardown. Windows "Receive Side Scaling" may be similar. On at least these platforms, we can therefore run up to one endpoint driver per core. Some architectural changes are required to prevent catastrophic contention.

Key quinn_proto::Endpoint methods take &mut self, preventing meaningful parallelization of datagram processing. We should embrace interior mutability to convert these to &self methods which will not contend on the hot path of handling datagrams of established connections. In particular:

Tokio presently lacks a mechanism to ensure certain tasks run on separate threads, which may complicate improving parallelism for high-level users. We could address this by spawning our own threads, or by working with upstream to develop new APIs. It's also possible that simply spawning N drivers and letting work stealing do its thing might work out well enough in practice.

Unified Drivers

This refactoring has previously been associated with the unification of endpoint and connection drivers (e.g. #1219). I believe they can be separated, though moving quinn_proto::Connection ownership into quinn_proto::Endpoint may still be desirable for API simplicity and to reduce perhaps costly inter-thread communications. To avoid undermining our current parallelism, this should only be pursued after endpoints become horizontally scalable.

Flattening connection tasks complicates timer handling, since we don't want to poll each connection's timer on every endpoint wakeup. My timer-queue crate provides a solution. Each endpoint driver could maintain timers for connections involved in traffic passing through that driver. By discarding timeouts for a connection that has seen activity more recently on another driver we can ensure timers do not reintroduce contention.

Future Work

Opportunity also exists for fine-grained parallelism between streams on a single connection, similar in spirit to connections in a single endpoint.

Ralith commented 1 year ago

On review, handling multiple connections on a single task might actually be undesirable as it prevents tokio's work-stealing from redistributing work when e.g. a single connection involves disproportionately large amounts of work. Splitting up the endpoint task is still probably valuable, but maybe we don't want to inline connection tasks after all.

Also note tokio's multithreaded runtime relies on a single global epoll loop to drive I/O across all threads, which reportedly allows for more efficient work stealing than epoll-per-thread would. Unclear if this will be a significant bottleneck for Quinn.

PureWhiteWu commented 8 months ago

Hi, I've encountered related issue, when handling thousands of connections, quinn server cannot be scaled horizontally. The CPU cannot be used fully.