multiformats / go-multistream

an implementation of the multistream protocol in go
MIT License
40 stars 27 forks source link

Ignore error if can't write back echoed protocol in negotiate #87

Closed MarcoPolo closed 2 years ago

MarcoPolo commented 2 years ago

We would fail to negotiate if we couldn't echo back the protocol id. It should be valid to fail to write back the protocol ID and return the negotiate stream so that the caller may read data from the stream.

This pattern is common when a dialer opens a new stream, sends some data, then closes the stream immediately. (bitswap, graphsync). Without this there's really nothing a caller can do besides sleep enough to let the other side send it's multistream protocol response.

Fixes:

This is a regression from https://github.com/multiformats/go-multistream/pull/85. NegotiateLazy would not fail to negotiate for the listener.

r? @marten-seemann cc @Stebalien

marten-seemann commented 2 years ago

We would fail to negotiate if we couldn't echo back the protocol id. It should be valid to fail to write back the protocol ID and return the negotiate stream so that the caller may read data from the stream.

Agreed.

This pattern is common when a dialer opens a new stream, sends some data, then closes the stream immediately. (bitswap, graphsync). Without this there's really nothing a caller do besides sleep.

I'm trying to understand what this means in the QUIC stream state machine, where every stream is basically composed of two unidirectional streams that can separately be closed / reset. If we fail to write our multistream response, that must mean that the peer reset the write side of our stream. That also means that while we'll be able to read what the peer sent on the stream, we will never be able to write anything. Is my interpretation correct? In what cases would the peer's behavior make sense? It sounds like a "fire and forget" kind of message.

MarcoPolo commented 2 years ago

I'm trying to understand what this means in the QUIC stream state machine, where every stream is basically composed of two unidirectional streams that can separately be closed / reset. If we fail to write our multistream response, that must mean that the peer reset the write side of our stream.

Yep

That also means that while we'll be able to read what the peer sent on the stream, we will never be able to write anything.

Yep.

Is my interpretation correct?

Yep.

In what cases would the peer's behavior make sense? It sounds like a "fire and forget" kind of message.

There's nothing that says a peer can't do this. As far as I've seen we allow a peer to:

  1. write data to a stream
  2. close that stream and have the writes flushed and delivered to the peer.

Nothing says that a peer has to wait for an ack response before closing the stream. That wouldn't make sense when you already have the reliability gaurantees of tcp or quic.

While I'm not sure the graphsync/bitswap use case is an efficient use of resources (opening a transient stream to send a single message doesn't seem great), I don't think they are doing anything wrong as clients.

It is like a fire and forget message, but they want to make sure the message gets to the destination (as opposed to a unreliable datagram).


The issues we've seen around this only happen when there is really low latency between the nodes. What actually causes the bug is:

  1. Peer A writes MULTISTREAM PROTOCOL
  2. Peer B echos MULTISTREAM PROTOCOL
  3. Peer A writes
  4. Peer A writes
  5. Peer A attempts to close their stream (for their reading and writing).
    1. At the same time, Peer B tries to send back
    2. Peer B's negotiate fails to send this back since the write stream to peer A is closed.
    3. Peer B's negotiate bubbles up this error. The caller resets the stream.
  6. Peer A gets the stream reset error and returns it to the caller.
  7. Both sides end up with an error and no one is happy :)