libp2p / specs

Technical specifications for the libp2p networking stack
https://libp2p.io
1.58k stars 275 forks source link

Stream Migration Protocol #328

Open Stebalien opened 3 years ago

Stebalien commented 3 years ago

Libp2p should support protocol agnostic stream migration. This will make upgrading to better transports "seamless".

Requirement

  1. Transport agnostic. Really, this means migrating at the stream level.
  2. Minimal overhead. Overhead should be at most a small per-stream cost (no additional framing, etc.)
  3. No interruption. Reading/writing should be continuous.
  4. Transparent. Applications using migratable streams shouldn't notice anything.
  5. Correct. There can't be any ambiguity (one side believing the migration happened, the other side disagreeing, etc.).

Protocol

Here we describe a protocol for migrating a "source" stream (stream A) to a "target" stream (stream B).

When opening a stream, if the remote peer supports the stream migration protocol (discovered through identify) and the stream is "long lived" (opt in or opt out?):

  1. First, open a stream (stream A).
  2. Negotiate the stream migration protocol.
  3. Send a message indicating that we're opening a new stream, along with a unique ID for the stream.
  4. Negotiate the actual protocol.

Because we "know" that the peer supports this stream migration protocol, we can pipeline these steps so as not to take any additional round-trips.

To migrate a stream:

  1. The initiator of the stream migration will open a new stream (stream B), on a the "target" connection (the connection to which we're migrating the stream), to the remote peer. This is the "target" stream for the migration.
  2. On stream B, the initiator will negotiate the stream migration protocol.
  3. On stream B, the initiator will send a message indicating that we're migrating a stream, along with the ID of the stream we're migrating (the source stream, stream A).
  4. The receiver will acknowledge the migration (on stream B) and, from this point onward, treat any "EOF" (close) on stream A as a migration to stream B.
  5. When the initiator receives the acknowledgement on stream B, it will close stream A for writing, and start writing on stream B.
  6. When the receiver receives the EOF on stream A, it will send an EOF on stream A, switching over to stream B.
  7. When the initiator receives an EOF on stream A, it will start reading on stream B.

At this point, the stream is fully migrated.

Resets

If either stream is "reset" before both ends are closed, both streams must be reset and the stream as a whole should be considered "aborted" (reset).

Half-Closed

If stream A was half-closed (either for reading or writing), that state must be replicated on the new stream after the initial handshake. Importantly, there's an edge-case:

  1. The receiver tries to stream A for writing.
  2. The receiver receives a migration request on stream B, for stream A.
  3. The receiver ACKs the migration request.
  4. The initiator sees the ACK on stream B.
  5. The initiator sees the EOF on stream A, and treats it as the migration EOF.

This is fine. The stream will be migrated and the EOF will be re-played on stream B, leaving stream B in the intended state.

Analysis

Stebalien commented 3 years ago

(cc @marten-seemann who helped design this)

Stebalien commented 3 years ago

Note: This protocol will not allow recovering a session if "lost" (i.e., the connection was cut). Doing so would require keeping large write buffers and tracking acknowledgement states in userspace. This protocol will primarily aid the connection manager combine duplicate connections into a single connection, or migrate streams from a worse connection (e.g., TCP) to a better connection (e.g., QUIC).

bertrandfalguiere commented 3 years ago

Will this be used for upgrade from relayed connections to direct connection ?

yusefnapora commented 3 years ago
  1. When the receiver receives the EOF on stream A, it will send an EOF on stream B, switching over to stream A.

Is this reversed? It seems like the receiver should send EOF on stream A and switch to stream B.

Looks like a great proposal to me 👍

Stebalien commented 3 years ago

Will this be used for upgrade from relayed connections to direct connection ?

Yes.

Is this reversed? It seems like the receiver should send EOF on stream A and switch to stream B.

Yes...

SgtPooki commented 1 year ago

Overhead should be at most a small per-stream cost (no additional framing, etc.)

I'm not sure if this is accurate. Based on step 1, "open a new stream on a new connection" there is a new connection made.

Maybe "at most, a small per-stream cost + new connection overhead," but I may lack understanding here.


The initiator will open a new stream (stream B), on a new connection, to the receiver. This is the target stream for the migration.

This and other lines are quite confusing. By "target" stream for the migration, do we mean the resulting stream? or target stream to be migrated? Technically, there are two streams targeted by a stream migration.

It would be nice to clarify the terminology for the two streams in the migration. It seems like Stream B is the "final" stream, and Stream A is the to-be-migrated stream. It would be nice to clarify and make language consistent in the spec.

Potential legend:

Term Definition
Leader The Peer who begins/initiates the connection with the Participant peer
Participant The Peer who receives/acknowledges the connection and streams with the Leader peer
Negotiation-stream An initial stream created in an existing connection between Leader and Participant peers.
Goal-stream A new stream, using an "upgraded" transport when compared to the Negotiation stream, created on a new connection between leader and participant peers.
Using this legend, the spec would change as follows: ## Requirement 1. Transport agnostic. Really, this means migrating at the stream level. 2. Minimal overhead. Overhead should be at most a small per-stream cost (no additional framing, etc.) 3. No interruption. Reading/writing should be continuous. 4. Transparent. Applications using migratable streams shouldn't notice anything. 5. Correct. There can't be any ambiguity (one side believing the migration happened, the other side disagreeing, etc.). ## Protocol When opening a stream, if the target peer supports the stream migration protocol (discovered through identify) and the stream is "long lived" (opt in or opt out?): 1. First, open a stream (Negotiation-stream). 2. Negotiate the stream migration protocol. 2. Send a message indicating that we're opening a new stream (Goal-stream), along with a unique ID for the stream. 3. Negotiate the actual protocol. Because we "know" that the peer supports this stream migration protocol, we can pipeline these steps so as not to take any additional round-trips. To migrate a stream: 1. The Leader will open a new stream (Goal-stream), on a new connection, to the Participant. 2. On Goal-stream, the Leader will negotiate the stream migration protocol. 3. On Goal-Stream, the Leader will send a message indicating that we're migrating a stream, along with the ID of the stream we're migrating 4. The Participant will acknowledge the migration (on Goal-stream) and, from this point onward, treat any "EOF" (close) on Negotiation-stream as a migration to Goal-stream. 5. When the Leader receives the acknowledgement on Goal-stream, it will close Negotiation-stream for writing, and start writing on Goal-stream. 6. When the Participant receives the EOF on Negotiation-stream, it will send an EOF on Negotiation-stream, switching over to Goal-stream. 7. When the Leader receives an EOF on Negotiation-stream, it will start reading on Goal-stream. At this point, the stream is fully migrated. **Resets** If either stream is "reset" before both ends are closed, both streams must be reset and the stream as a whole should be considered "aborted" (reset). **Half-Closed** If Negotiation-stream was half-closed (either for reading or writing), that state must be replicated on the new stream after the initial handshake. Importantly, there's an edge-case: 1. The Participant tries to use the Negotiation-stream for writing. 2. The Participant receives a migration request on Goal-stream, for Negotiation-stream. 3. The Participant ACKs the migration request. 4. The Leader sees the ACK on Goal-stream. 5. The Leader sees the EOF on Negotiation-stream, and treats it as the migration EOF. This is fine. The stream will be migrated and the EOF will be re-played on Goal-stream, leaving Goal-stream in the intended state. ## Analysis * Transport agnostic: This protocol can migrate any stream from any transport to any other stream-based transport. It can even migrate unidirectional and half-closed streams, as long as the new transport supports opening bidirectional streams, and can subsequently half-close them. * Overhead: This protocol will have a small overhead due to the multistream header, and stream ID, but that shouldn't be much in the grand scheme of things (especially if multistream 2 lands at some point). Importantly, this protocol requires no message framing. * Interruption: Writing switches instantly to an already prepared stream with no delay. * Transparent: This protocol supports all the normal stream features (half-close, reset, etc.). * Correct: There are no "undecidable cases" (to be confirmed in a PoC implementation).

The receiver tries to stream A for writing.

"The receiver tries to [use?] stream A for writing.

Stebalien commented 1 year ago

I'm not sure if this is accurate. Based on step 1, "open a new stream on a new connection" there is a new connection made.

Well, this is a stream migration protocol. The goal is to migrate a stream from connection A to connection B. In this case, that "new connection" is connection B and the "new stream" is the the stream we're migrating from connection A.

I think the confusion is "new connection". I'll rename them to "target" and "source".

This and other lines are quite confusing. By "target" stream for the migration, do we mean the resulting stream? or target stream to be migrated? Technically, there are two streams targeted by a stream migration.

It's the stream to which we're migrating. I'll try to clarify it a bit.

Stebalien commented 1 year ago

I've tried to make it a bit more explicit.

MarcoPolo commented 1 year ago

fyi, we have this as a spec proposal: https://github.com/libp2p/specs/pull/406

We haven't merged because there hasn't been a real implementation nor the demand for it.

Longer term, I'd prefer more effort focused on connection migration in QUIC rather than this effort because:

  1. Connection migration is well defined.
  2. It would be better to use connection migration for the QUIC transport rather than this.
  3. QUIC is the majority of the network.
Stebalien commented 1 year ago

In this migration protocol, I'm primarily targeting migrating streams off a relay and/or "combining" connections when we happen to establish multiple.

MarcoPolo commented 1 year ago

I think focusing on the "migrating off relay" use case is good. However I'm not sure in practice what you would do that starts on a public relay and continues on a direct connection. Because public relays are so limited (128KB/2min on Kubo) they aren't useful for much besides trying to get a direct connection. You wouldn't start fetching a file on a relayed connection and then continue on a direct one. Maybe there's a use case I'm missing?

MarcoPolo commented 1 year ago

"combining" connections when we happen to establish multiple.

Hopefully this is less prevalent now with the smart dialing work: https://github.com/libp2p/go-libp2p/releases/tag/v0.29.0

Stebalien commented 1 year ago

You wouldn't start fetching a file on a relayed connection and then continue on a direct one.

I could see sending a wantlist (bitswap) over a relay. Technically we could just kill the stream and re-create it.

But yeah, QUIC stream migration is higher priority and likely better in most cases.

marten-seemann commented 1 year ago

If / when https://datatracker.ietf.org/doc/draft-seemann-quic-nat-traversal/ ever becomes a reality, you'll be able to migrate your relayed QUIC connection to a hole-punched connection. Just to set expectations, this is very likely not going to happen within the next 12 months.

SgtPooki commented 1 year ago

I think focusing on the "migrating off relay" use case is good. However I'm not sure in practice what you would do that starts on a public relay and continues on a direct connection.

One example is a browser js-libp2p node who ends up having only p2p-circuit dialable multiaddrs.

Couldn't any node who has limited transport capabilities, and relies on relays to talk to the network, benefit from this? or is DCUtR supposed to handle most of those use-cases?

DCUtR attempted to solve this for us in js-libp2p and Helia land. To my untrained eyes, it seems very similar, but instead of an up-front connection migration (transient -> direct), it would be a mid-flight migration.

If we did implement a stream-migration protocol, would that allow us to stop limiting relay throughput, and instead depend upon DCUtR + stream-migration(SM) in order to transition the relay-started-transfer to a stream on the direct connection? If the DCUtR+SM process failed, we could drop the connection.. but in that case, isn't it better to just attempt DCUtR and never start the transfer if it doesn't succeed?

(apologies for the dumb questions, just trying to get on all of your libp2p-experts'-brainwaves)