Open schmichael opened 3 weeks ago
Specific concerns, wrt UDP mangling on the internet, I'm pretty sure that fly.io isn't the only company that was/is operating a single Nomad control plane for a global cluster over the public internet. (proof: https://fly.io/blog/carving-the-scheduler-out-of-our-orchestrator/)
https://dl.acm.org/doi/10.1145/3589334.3645323
Another nail in QUIC's coffin
Background
Nomad like Consul, uses yamux for its RPC layer's underlying network transport. Yamux is based on SPDY. SPDY has been obsolete since 2015, although its ideas form the basis of HTTP/2's transport layer.
Yamux has proven powerful and reliable, needing and receiving very little maintenance over its 10 year lifespan. However this means that there's very little expertise in the codebase when issues do arise, and the code does not adhere to modern Go idioms.
Proposal
Replace Nomad's use of Yamux with QUIC. QUIC is the basis for HTTP/3, but unlike SDPY+HTTP/2, QUIC is being intentionally standardized independently (RFC 9000), and is being proposed for more widespread use such as DNS-over-QUIC (RFC 9250).
UDP
QUIC is based on UDP instead of TCP which poses both an opportunity and risk for Nomad:
This does allow Nomad to add QUIC support at any time and implement an IPv6 Happy Eyeballs style algorithm for determining whether to use the TCP/Yamux or UDP/QUIC transport.
TLS
QUIC mandates TLS. This would require Nomad to mandate TLS and pose a significant upgrade hurdle. Implementing something like Consul's auto config would be necessary to ease the transition, although there's likely no way to upgrade to TLS without forcing some user intervention.
Go Implementations
QUIC is not officially supported by the Go standard library as of Go 1.23. The
crypto/tls
package exposes some QUIC internals but is not intended for direct use. golang/go#58547 tracks QUICs inclusion in Go's stdlib.golang.org/x/net
contains the WIP implementation that is intended to be the basis of Go's future HTTP/3 support.Multiple third party QUIC implementations exist as well, although
quic-go
seems like the dominant implementation:The proposed choice for Nomad would be to use a stdlib implementation to ensure the widest compatibility and most support.
Alternative: libp2p
libp2p forked yamux and has done quite a bit more maintenance. Switching to or merging their fork is a far less significant change than switching protocols.
Alternative: HTTP/3
Instead of switching yamux->quic, Nomad could switch from rpc->http/3. This could entail dropping the entire RPC subsystem (which itself is quite antiquated and lacks basic features such as context cancellation). All RPCs without a corresponding HTTP API would need to have an HTTP API implemented. Raft currently uses its own TCP connection and would need special consideration when moving to HTTP.
This would be a huge undertaking, and there's no reason to do it at the same time as moving from yamux->quic. Upgrading our RPC implementation can be done independently of choosing an underlying transport.
Roadmap
There is no roadmap for implementing QUIC in Nomad.
Please leave feedback in the form of: