multiformats / multiaddr

Composable and future-proof network addresses
https://multiformats.io/multiaddr
MIT License
419 stars 84 forks source link

Proposal: add a new codepoint for QUIC v1 (RFC 9000), and for future incompatible versions #145

Closed marten-seemann closed 1 year ago

marten-seemann commented 1 year ago

Current Status

libp2p started deploying QUIC a few years ago, when RFC 9000 was not deployed yet. Back then, we deployed QUIC draft-29. Since node operators are slow to update their go-ipfs nodes, draft-29 is now the most deployed version.

QUIC draft-29 QUIC draft-29 + v1 QUIC v1
QUIC draft-29 ✔️ ✔️
QUIC draft-29 + v1 ✔️* ✔️ ✔️*
QUIC v1 ✔️ ✔️

*: There's a 1 RTT penalty if the version offered on the initial connection attempt doesn't match the nodes version, as Version Negotiation is performed.

Currently, go-libp2p has support for QUIC draft-29 and QUIC v1. Since draft-29 is the most commonly deployed version (and supported by all nodes on the network), we use that version for dialing new connections.

What happens when rust-libp2p gains QUIC support

rust-libp2p is adding QUIC support, and quinn only supports RFC 9000. If we don't do anything, this means:

  1. Legacy nodes (those supporting only draft-29) won't be able to connect to rust-libp2p nodes. This is only fair. If you run seriously outdated software, you'll have a bad time. You asked for it.
  2. rust-libp2p nodes won't be able to tell which QUIC versions a node supports. It will try to connect to legacy nodes, and the QUIC connection will fail after 1 RTT.
  3. go-libp2p nodes now have two options:
    1. continue preferring draft-29. This will incur a 1 RTT penalty when connecting to rust-libp2p nodes, as we'll need to perform version negotiation.
    2. prefer QUIC v1. This will remove the 1 RTT penalty, but will incur a 1 RTT penalty when connecting to legacy nodes. Given the large number of nodes, this might affect our TTFB metrics.

Proposal: add a new QUIC v1 code point

We could add a new code point for QUIC v1 (string representation: quicv1). The existing code point would be reinterpreted to mean QUIC draft-29. Nodes that support multiple versions can (and should!) offer them on the same port. There's no need to worry about demultiplexing, since QUIC packets contain the version number and any QUIC stack will be able to handle packets from different QUIC versions (if it's a multi-version stack).

But what about QUIC v2

Does this mean that we need to add a new code point for every new QUIC versions? Wouldn't that be wasteful. Yes and no. The IETF is currently working on specifying QUIC v2, and quic-go already has support for that QUIC version. We don't need a new code point though, because QUIC v2 is a compatible version (to QUIC v1). The exact definition of what constitutes compatibility between QUIC versions is subtle, but as a rule of thumb, if there's a transformation of the ClientHello from one version to a ClientHello of the other version, chances are that the versions are compatible. QUIC versions that use TLS 1.3 (or successors) are likely to be compatible. Using Compatible Version Negotiation (shipping as an RFC very soon), it is possible to do a version upgrade between two compatible versions without incurring any round trip penalty. Thus, it's fine to continue advertising QUIC v1, as the connection can seamlessly be upgraded to v2 during the handshake.

Only when / if QUIC v2 becomes a dominant version on the internet, AND there are good reasons to not use QUIC v1 any more, would it make sense to introduce a v2 code point, so that compatible version negotiation can be skipped.

Hypothetical: an incompatible QUIC version is defined

A QUIC that uses a different handshake protocol than TLS 1.3 would almost certainly not be be compatible with QUIC v1. Assuming that libp2p would want to support both versions, it would make sense to introduce a new codepoint for that QUIC version, as we'd incur an additional roundtrip for version negotiation when nodes offer an unsupported version.

marten-seemann commented 1 year ago

cc @Stebalien @elenaf9 @MarcoPolo @kpp @thomaseizinger @mxinden

thomaseizinger commented 1 year ago

3. prefer QUIC v1. This will remove the 1 RTT penalty, but will incur a 1 RTT penalty when connecting to legacy nodes. Given the large number of nodes, this might affect our TTFB metrics.

I guess ideally, we go with this option but somehow minimize the 1 RTT penalty. What do you think of this idea:

This should give node operators some time to notice the issue and upgrade their software accordingly. Making an informed decision on <sunset date> should allow you to mitigate most of the risk on affecting the TTFB metric.

marten-seemann commented 1 year ago
  • Emit a warning every time you hit the 1 RTT penalty by having to upgrade to QUIC v1.
  • Emit a warning every time you connect you connect to a draft-29 node that this connection will incur a 1 RTT penalty from .

You can't do that: You can't update old nodes, so the node operators who need to see the warning will never see it. Instead, you'd just be spamming the logs of node operators of updated nodes, potentially with multiple messages per second. Even worse, these messages wouldn't be actionable.

thomaseizinger commented 1 year ago
  • Emit a warning every time you hit the 1 RTT penalty by having to upgrade to QUIC v1.
  • Emit a warning every time you connect you connect to a draft-29 node that this connection will incur a 1 RTT penalty from .

You can't do that: You can't update old nodes, so the node operators who need to see the warning will never see it.

Hmm okay, I assumed that people will update at some point but yeah, if there are many operators who don't update at all, then this is definitely an issue.

Instead, you'd just be spamming the logs of node operators of updated nodes, potentially with multiple messages per second. Even worse, these messages wouldn't be actionable.

I mean, that is fixable. You can just emit the log only once per peer. Also, you'd remove the log again in the version that prefers QUICv1 so updated nodes would not emit that.


I am a bit on the fence here. On the one hand, adding a new protocol is a legitimate upgrade path. On the other hand, it feels like a bit of a waste to use when the situation is:

a) temporary b) penalizes mostly nodes which run outdated software

Can we make an estimate of how long the grace-period of a time-based trigger that switches to v1 would have to be to have most of the network prefer v1? Are we talking weeks, months or years?

Another, more sophisticated mechanism for such kind of upgrades is to have nodes keep track[^1] of how many of their peers support v1 and switch to preferring that once it hits a configured threshold.

Assuming that nodes connect to a representative subset of the network, this would make the switch gracefully as soon as a configured threshold is capable of v1. This is similar to Bitcoin's softfork activation techniques.

[^1]: This will need to be persisted between restarts which slightly complicates the implementation.

marten-seemann commented 1 year ago

b) penalizes mostly nodes which run outdated software

That’s totally fine. We don’t want to penalize nodes that run up-to-date software though, which is what we’d do if we switched to dialing v1 by default.

it feels like a bit of a waste to use when the situation is:

a) temporary

It’s not any more temporary than any other update of QUIC from one incompatible version to the other.

I find it quite instructive to think about what we’d what to do once an incompatible QUIC version is specified, e.g one that’s using a different handshake protocol: As I’ve described in my original post, we’d want to specify a new code point in that case. Thus, minting a new code point for QUIC v1 seems consistent.

thomaseizinger commented 1 year ago

it feels like a bit of a waste to use when the situation is: a) temporary

It’s not any more temporary than any other update of QUIC from one incompatible version to the other.

I find it quite instructive to think about what we’d what to do once an incompatible QUIC version is specified, e.g one that’s using a different handshake protocol: As I’ve described in my original post, we’d want to specify a new code point in that case. Thus, minting a new code point for QUIC v1 seems consistent.

Should we then be more elaborate in the naming of the new code point? Like quic-rfc9000? I guess ideally, the current quic one would be called quic-draft-29. Might be a good mental note to take for future protocols.

I am okay with a new code point, it is a valid upgrade / negotiation technique and really the only annoying thing is the aesthetics and documentation effort.

b) penalizes mostly nodes which run outdated software

That’s totally fine. We don’t want to penalize nodes that run up-to-date software though, which is what we’d do if we switched to dialing v1 by default.

I would still find it interesting to back this with some data. Does go-libp2p support RFC9000 today? Does it use RFC9000 if both nodes support it? Are there any numbers on what % of connections that is?

marten-seemann commented 1 year ago

@elenaf9 What would your implementation strategy be on the Rust side? Can you give an estimate how much work this would be?

I wrote up what's needed in go-libp2p in https://github.com/libp2p/go-libp2p/issues/1841. Not sure if you want to support draft-29 in rust-libp2p as well, it might be fine to just support QUIC v1. Since rust-libp2p never had QUIC support to begin with, you'd not be causing a regression.

marten-seemann commented 1 year ago

@mxinden @MarcoPolo I'm in favor of moving forward with this proposal, but it would be helpful to get your input here.

mxinden commented 1 year ago

First off, thanks for the detailed write-up @marten-seemann!


rust-libp2p is adding QUIC support, and quinn only supports RFC 9000. If we don't do anything, this means:

Not sure if you want to support draft-29 in rust-libp2p as well, it might be fine to just support QUIC v1. Since rust-libp2p never had QUIC support to begin with, you'd not be causing a regression.

For the record, does quinn support draft-29 or does it not @elenaf9?


2. prefer QUIC v1. This will remove the 1 RTT penalty, but will incur a 1 RTT penalty when connecting to legacy nodes. Given the large number of nodes, this might affect our TTFB metrics.

To be able to make an informed decision, can someone add numbers to "large number of nodes [running draft-29 in the IPFS network". Unfortunately I don't have this data with kademlia-exporter.max-inden.de/. @dennis-tra maybe?


We could add a new code point for QUIC v1 (string representation: quicv1). The existing code point would be reinterpreted to mean QUIC draft-29.

I think renaming the text representation of the /quic code point to /quic-draft29 adds complexity which is unfortunate. That said, with the suggestion below, in case we are consistent across the many multiaddr implementations, it would be worth it.

Implementations MAY choose to continue accepting /quic multiaddresses (and interpret them as quic-draft29 or quic-v1 at their discretion).

https://github.com/multiformats/multicodec/pull/298

I think we should be consistent, i.e. either interpret /quic as /quic-draft29 or /quic-v1 across all our multiaddr implementations.


Overall I don't have a strong opinion here. My intuition tells me to:

marten-seemann commented 1 year ago

To be able to make an informed decision, can someone add numbers to "large number of nodes [running draft-29 in the IPFS network". Unfortunately I don't have this data with kademlia-exporter.max-inden.de/. @dennis-tra maybe?

These are the go-ipfs v0.7.0 nodes, so it's a very large fraction of the IPFS network (30-40%).

See this as another argument for investing into upgrade-your-IPFS-node advocacy.

The cost is only paid by nodes that upgraded, so the incentives are severely misaligned here.

MarcoPolo commented 1 year ago

Thanks Marten! I think this is worth doing sooner rather than later. I think I’m in agreement with the original post. Here’s a bit of clarification on the exact semantics since I think we’ve been discussing a lot of different things in this thread (that’s a good thing!).

This issue is only about adding a new code point for quic-v1, not about whether to rename the existing code-point to quic-draft-29. Let’s talk about the rename in a another issue (I’m for it, but it needs some thought around implementation). Letting nodes be explicit in their support of quic-v1 seems like a great thing, and (besides an extra multiaddr or two) I don’t see any downsides here. rust-libp2p may only want to advertise quic-v1, although according to the release notes on quinn they support draft-29.

I think we should add the new codepoint as soon as possible so that rust-libp2p can start using it and we can start advertising it in go-libp2p. The new codepoint allows dialers to know ahead of time what version to use.

Then there’s the very closely related issue of:

go-libp2p nodes now have two options: i. continue preferring draft-29. This will incur a 1 RTT penalty when connecting to rust-libp2p nodes, as we'll need to perform version negotiation. ii. prefer QUIC v1. This will remove the 1 RTT penalty, but will incur a 1 RTT penalty when connecting to legacy nodes. Given the large number of nodes, this might affect our TTFB metrics.

(I’m defining new nodes as nodes post this upgrade; old nodes as nodes prior to this upgrade) If new nodes advertise their explicit support of quic-v1 then we don’t pay the 1 RTT penalty when new nodes dial other new nodes (they know to use quic-v1). If a new node only sees a node support the quic codepoint (draft 29) then it should dial with draft-29. We don’t incur a 1RTT penalty if rust-libp2p nodes advertise quic-v1 since the new node would have seen that multiaddr.

Option i. is strictly better than ii. because we can avoid the 1RTT penalty by only dialing quic-v1 if the other node explicitly advertises it, which will be the case if we agree on a new this new codepoint sooner and have rust-libp2p use the quic-v1 codepoint. Option ii. forces a 1 RTT penalty when connecting to older nodes (a significant portion of the network currently). Am I missing something here?

I think codepoints are cheap. The way we represent a list of multiaddrs is a bit inefficient, but we can optimize this in the future if it becomes a problem.


I think we should be consistent, i.e. either interpret /quic as /quic-draft29 or /quic-v1 across all our multiaddr implementations.

Agreed. Let’s keep the current codepoint and string /quic to mean quic-draft29. We may even want to alias this to have new nodes represent this as the string /quic-draft29 to be clearer, but this is a bit of a tangent.


tl;dr I agree the new codepoint is a useful addition. I don’t see any major downsides to introducing it. Let’s do it.

marten-seemann commented 1 year ago

Thank you for your comprehensive response, @MarcoPolo!

I think we should be consistent, i.e. either interpret /quic as /quic-draft29 or /quic-v1 across all our multiaddr implementations.

Agreed. Let’s keep the current codepoint and string /quic to mean quic-draft29. We may even want to alias this to have new nodes represent this as the string /quic-draft29 to be clearer, but this is a bit of a tangent.

That makes sense. The hope is that in the long term (as QUIC draft-29 is phased out), the /quic string representation will also slowly die. For the transition period, we should parse /quic as /quic-draft29. Once we disable draft-29 support (https://github.com/libp2p/go-libp2p/issues/1841 suggests a transition period of 6 months), we might also stop parsing /quic.

elenaf9 commented 1 year ago

Apologies for the late reply here.

https://github.com/libp2p/rust-libp2p/pull/2289, and quinn only supports RFC 9000. If we don't do anything, this means:

Not sure if you want to support draft-29 in rust-libp2p as well, it might be fine to just support QUIC v1. Since rust-libp2p never had QUIC support to begin with, you'd not be causing a regression.

For the record, does quinn support draft-29 or does it not @elenaf9?

Quinn supports draft-29, however it does not support version negotiation on the client side. Instead, when initiating a new outbound connection we have to set the QUIC version in the client config. If the server sends back a Version negotiation packet because it does not support that version the connection attempt will error with VersionMismatch. See quinn-rs/quinn#1249.

@elenaf9 What would your implementation strategy be on the Rust side? Can you give an estimate how much work this would be?

Given the above, it wouldn't be much work to support both the new and the old-codepoint. When initiating a new dial we'd check the codepoint, in case of /quic or /quic-draft-29 set out client config version to draft-29, else use default QUIC v1. As a server we support all (>= draf-29) versions, but our listening addresses would have the /quic-v1 codepoint.

Once we disable draft-29 support (https://github.com/libp2p/go-libp2p/issues/1841 suggests a transition period of 6 months)

What's the difference between dropping draft-29 support in 6 months compared to dropping it right now? I understand that go-ipfs v0.7 makes up a significant portion of the IPFS network, however go-ipfs v0.7 is two years old. If those nodes did not upgrade for all that time do we expect anything to change in the next 6 months? If I understand it correctly dropping support for draft-29 would avoid the need for a new codepoint (at least until a new incompatible version is published)?

MarcoPolo commented 1 year ago

What's the difference between dropping draft-29 support in 6 months compared to dropping it right now? If I understand it correctly dropping support for draft-29 would avoid the need for a new codepoint?

No we still would want a new codepoint so that new nodes can know before dialing if the node speaks v1 or draft 29. If we don’t adopt a new codepoint and assume everyone uses quicv1 then you incur a penalty when communicating with (the many) old nodes.

elenaf9 commented 1 year ago

What's the difference between dropping draft-29 support in 6 months compared to dropping it right now? If I understand it correctly dropping support for draft-29 would avoid the need for a new codepoint?

No we still would want a new codepoint so that new nodes can know before dialing if the node speaks v1 or draft 29. If we don’t adopt a new codepoint and assume everyone uses quicv1 then you incur a penalty when communicating with (the many) old nodes.

Okay makes sense.


Don't have a strong opinion on this; adding a new codepoint sounds reasonable 👍. Will do a PR for it on rust-multiaddr.

lidel commented 1 year ago

iiuc we executed on /quic-v1 and /quic-v1/webtransport, implemented in Rust/Go, and recently shipped in Kubo 0.18 and the rollout has begun:

is there anything else to be done here, or can we close this?

marten-seemann commented 1 year ago

This has been fully resolved and released.