Proposal: 1-RTT Handshakes (incl. Identify)

marten-seemann commented 1 year ago

A (libp2p) QUIC handshake takes 1 RTT, a libp2p TCP handshake incl. takes 4 RTTs (3 RTTs when using the inlined muxer negotiation). That's not the whole story however. After the handshake, implementations might wait for Identify to finish before actually making the connection available to the application. Since Identify is a request-response protocol (peer A opens a /ipfs/id/1.0.0 stream, and then waits for the response), this consumes another round-trip.

The 0.5-RTT optimization

TLS 1.3 allows the server to send application data right after receiving the client's first flight (which is 0.5 RTTs after the client started the handshake, hence the name). At this point, the server possesses the keys to encrypt application data packets, however, as it hasn't received the client's certificate yet, it doesn't know who it's sending the data. For the client, data sent in 0.5 RTT is indistinguishable from data sent after handshake completion (unless looking at packet timing), so there's no changes needed on the client side to accept 0.5-RTT data.

The Go standard library doesn't expose an API to use 0.5-RTT data, but quic-go will do, starting with the next release: https://github.com/lucas-clemente/quic-go/issues/3634.

Identify: Request vs Push

Unfortunately, this doesn't allow us to get the handshake down to 1 RTT, since Identify itself is request-response. There is a push variant, Identify Push (/ipfs/id/push/1.0.0), which allows peers to push their Identify message instead of waiting for the incoming identify stream. If the server used that variant in 0.5-RTT data, the client would finish the handshake (incl. Identify) within just a single round trip.

Legacy clients will continue using the regular Identify protocol, and we don't necessarily want to send them our Identify data twice. Legacy servers won't send the Identify push message, and clients need to decide if they want to start the regular Identify request or not.

We can solve this problem by distinguishing between the current Identify mode and the push mode suggested here. QUIC currently uses the "libp2p" ALPN. By minting a new ALPN identifier, e.g. "libp2p+idpush", peers could negotiate the new behavior (or fall back to the old behavior) in an unambiguous way.

Required Spec Changes

[ ] specify the new ALPN for libp2p+QUIC, link the Identify and the QUIC (or TLS) document

cc @MarcoPolo @mxinden @thomaseizinger @elenaf9 @p-shahi @achingbrain @Stebalien @Menduist

Menduist commented 1 year ago

Do we really need the ALPN optimization? Seems really ad-hoc, won't work on other transports, and will only save a tiny bit of bandwidth

thomaseizinger commented 1 year ago

Do we really need the ALPN optimization? Seems really ad-hoc, won't work on other transports, and will only save a tiny bit of bandwidth

I think the real optimization @marten-seemann is after is latency, not bandwidth.

This sounds conceptually a lot like HTTP2 Server Push where the idea was that a server can send data to a client before it even requests it. It is being (or was already?) removed from Chromium and Chrome because it is apparently not used very often and the "success rate" is not very good.

Is there a way we can architect this such that we don't need a new ALPN identifier and this instead happens transparently? For example, is there much harm in an application requesting identify information from a despite it getting it pushed? It will still optimize for latency at the cost of potential double transmission.

Such an implementation would allow for measuring how well this works without having to adapt the spec.

Alternatively, I might be worth considering whether we can generalize this concept. For example, does it make sense for implementations to register a protocol as "early data" and it will be executed as part of the handshake.

elenaf9 commented 1 year ago

Wouldn't we still need another round-trip to run multi-stream select on the stream itself? Per spec: Connection - Opening New Streams Over a Connection we always run multistream-select on new streams. In your proposal, would the client after receiving the server-hello just assume that the next inbound stream is the server's identify-push? Or would the protocol on the stream be communicated with something like the Offer protobuf described in the multistream-select 2.0 draft?

Alternatively, I might be worth considering whether we can generalize this concept. For example, does it make sense for implementations to register a protocol as "early data" and it will be executed as part of the handshake.

I agree with @thomaseizinger. In rust-libp2p we treat identify as any other protocol - we don't "wait for Identify to finish before actually making the connection available to the application". I can see why it makes sense to do it but afaik it's not part of any spec so I am not sure if it makes sense to treat identify here differently than any other protocol.

We already support inlining the muxer selection into the security handshake. Couldn't we extend this logic to support inlining any "early protocol" negotiation in the security handshake? e.g. (no thought through, just a first idea) together with the supported muxers the client inlines the ids of supported "early-protocols" in the security handshake. The server echoes those back that it supports, and either side can then directly open a stream for any of them. Edit: Nevermind, I wrongly thought that the security handshake would give us all overlapping ALPNs, when in fact it only gives us a single one. Thus my idea would not work.

mxinden commented 1 year ago

Wouldn't we still need another round-trip to run multi-stream select on the stream itself?

Both Go and Rust support optimistically sending application data along with the first multistream-select message, thus "saving" one roundtrip. See Rust side and Go side.

MarcoPolo commented 1 year ago

We already support inlining the muxer selection into the security handshake. Couldn't we extend this logic to support inlining any "early protocol" negotiation in the security handshake?

The cool thing about this proposal is that it doesn't need a special way of formatting early data. It looks like regular data from a stream to the client.

Is there a way we can architect this such that we don't need a new ALPN identifier and this instead happens transparently?

I think the issue is that clients won't know if the server will push the identify (proposal) or if they have to request the identify (current strategy).

This sounds conceptually a lot like HTTP2 Server Push

The conceptual difference is the client is still asking for the identify. It's just encoded in the ALPN. When the client says lets communicate with (libp2p+idpush OR libp2p), the server knows the client wants an identify via push. If the client doesn't want this, they can use only the libp2p ALPN.

Alternatively, I might be worth considering whether we can generalize this concept

What other cases can you think of that would benefit from this early data? I think identify is the most useful one since it gives me the protocols the peer speaks as part of establishing the connection. Unless we have another compelling use case I would rather focus on supporting this use case very well. And if another use case comes up in the future, we can always keep extending via the ALPN.

Overall, this is great and in practice would half the latency of getting a usable connection in the majority of cases! Thanks for this.

thomaseizinger commented 1 year ago

Is there a way we can architect this such that we don't need a new ALPN identifier and this instead happens transparently?

I think the issue is that clients won't know if the server will push the identify (proposal) or if they have to request the identify (current strategy).

Do they have to know? It would be consistent with what you said above ("doesn't need a special way of formatting early data") if a node would simply finish the connection setup with an inbound identify stream / payload sitting it is buffer. The logic within the node would have to be changed to "request identify if I don't have it" from "always fetch identify".

This sounds conceptually a lot like HTTP2 Server Push

The conceptual difference is the client is still asking for the identify. It's just encoded in the ALPN. When the client says lets communicate with (libp2p+idpush OR libp2p), the server knows the client wants an identify via push. If the client doesn't want this, they can use only the libp2p ALPN.

Requesting identify with a new ALPN requires a rollout to both nodes. I don't know the internals of the go-libp2p implementation but at least for rust-libp2p, receiving an inbound stream with identify push will trigger an event that we received identify information, regardless of where it came from. Following from that, any application code that is conditional on having identify will benefit from a decreased latency if the node we are connection to runs identify-push as part of the connection setup.

If we do it this way, any protocol that follows a "open stream, write, close" design will be compatible with this.

Alternatively, I might be worth considering whether we can generalize this concept

What other cases can you think of that would benefit from this early data? I think identify is the most useful one since it gives me the protocols the peer speaks as part of establishing the connection. Unless we have another compelling use case I would rather focus on supporting this use case very well. And if another use case comes up in the future, we can always keep extending via the ALPN.

I think the overall thinking is good: Don't generalize unnecessarily and instead build proven usecases. I do however also think that we shouldn't build too many special cases. The current proposal re-uses some abstractions (existing protocols) but ties them together with special triggers (ALPN identifier). I'd argue that we should:

Either design a system where any protocol can be sent optimistically as part of connection early data. Together with the multistream-select optimization, I think this should just work.
Or abandon the idea that we are embedding the "identify" protocol here and simply design an ALPN identifier that sends over supported protocols as part of the early data. If we make it clear that we aren't running identify here but just embedding the supported protocols, then it is less tempting to generalize this to other protocols at some point.

Menduist commented 1 year ago

I think the real optimization @marten-seemann is after is latency, not bandwidth.

My understanding is that the APLN is just used to avoid sending identify twice, which we'll have to do on other transports So it's a BW optimization, while the rest of this proposal is a latency optimization

Hence my comment, the APLN thingy seems to add complexity for a small BW optimization that we don't necessarily need

thomaseizinger commented 1 year ago

I think the real optimization @marten-seemann is after is latency, not bandwidth.

My understanding is that the APLN is just used to avoid sending identify twice, which we'll have to do on other transports So it's a BW optimization, while the rest of this proposal is a latency optimization

Hence my comment, the APLN thingy seems to add complexity for a small BW optimization that we don't necessarily need

Ah yes, you are right! You said in one sentence what I needed a paragraph for 🙈

marten-seemann commented 1 year ago

Lots of good discussion here, thank you guys!

My primary concern is (as always :)) latency. Saving 1 RTT during the handshake is what I’m after. I wouldn’t be too concerned about sending Identify twice for a transition period, it’s about 1 kB, so 1 additional packet.

I also want to avoid baking Identify too deeply into libp2p: it should be possible to run libp2p without even supporting the Identify protocol. This is important because 1. there are nodes out there that apparently already do exactly that and 2. Identify is a really not a single protocol, but a collection of multiple protocols, and it would be nice to split them up at some point.

Somewhat counterintuitively, putting it into the ALPN would help with that, since we could avoid “spamming” Identify Pushes to nodes that don’t speak Identify at all: You only get your Identify Push if you explicitly ask for it.

Now the usefulness of this proposal depends on how crucial the information contained in the Identify message actually is. This probably depends on the application protocol: If the client wants to open a stream with a certain application protocol and send a lot of data, it might not want to that optimistically, but first ascertain that the server actually speaks that protocol. On the other hand, if booting up the protocol is cheap and the initial flight of data is small, there’s probably less reason to not do that optimistically.

libp2p / specs