hashicorp / yamux

Golang connection multiplexing library
Mozilla Public License 2.0
2.23k stars 236 forks source link

writes to half-closed streams stall when sendWindow is exhausted #133

Open slingamn opened 2 months ago

slingamn commented 2 months ago

I'm working on an application that uses yamux for proxying. I ran into a case where the application leaks goroutines. Here's a simplified test case: https://gist.github.com/slingamn/1dafab6141e03d27a3f51bcbbcdb9972

This test case sets up a client and a server. The client sends 10 messages to the server over a *Stream. The server reads 5 messages, then closes its side of the stream. After this happens, the client side continues writing messages until sendWindow is exhausted. Then it gets stuck here:

https://github.com/hashicorp/yamux/blob/8bd691fb7605e7c7e8db7b552768b8f328cebe3e/stream.go#L227-L232

Since the other side has stopped reading, it will not send control messages that could unblock sendNotifyCh. So unless a write deadline has been set, Write() blocks until StreamCloseTimeout (5 minutes by default).

I naively tried to fix this by adding streamRemoteClose to the existing list of states (streamLocalClose, streamClosed) that cause writes to preemptively fail:

https://github.com/hashicorp/yamux/blob/8bd691fb7605e7c7e8db7b552768b8f328cebe3e/stream.go#L180-L184

This broke TestHalfClose, from which I understand that the current behavior is expected, at least in part. But if this is expected, then I'm not sure what the in-band mechanism is for signaling that the other side of the Stream has gone away. Is there an existing API in yamux to tell the other side to stop sending? (For comparison, (*net.TCPConn).Close() from the read side will cause writes to fail with EPIPE.)

Thanks very much for your time.

schmichael commented 1 month ago

Thanks for filing an issue! This is a tricky one!

I believe the existing code is correct, but I absolutely concede this is difficult to understand behavior. I think the key lies in 2.3.6 Stream Half-Close of the SPDY spec:

When one side of the stream sends a frame with the FLAG_FIN flag set, the stream is half-closed from that endpoint. The sender of the FLAG_FIN MUST NOT send further frames on that stream. When both sides have half-closed, the stream is closed.

In other words Stream.Close is not semantically similar to TCPConn.Close but rather to TCPConn.CloseWrite.

To put it in protocol terms:

So in your example when the Server closes its stream, that does not imply the Client should consider the Stream closed. The Server stream closing only implies the Server will no longer send data.

This begs the question:

Should yamux differentiate Stream.Close and Stream.CloseWrite? SPDY has a RST_STREAM control frame to abort streams, but I don't see that implemented in yamux.

Yamux only implements the RST flag on window update frames, but perhaps we should treat that like SPDY's? As far as I can tell the only place that Yamux RSTs individual Streams is when a timeout is hit.

Workaround: Read Closers

There is a workaround you should use while we consider the above:

go func() {
    _, err = cStream.Read([]byte{})
    if err != nil {
        log.Printf("Client.Stream.Read failed: %v", err)
        cStream.Close()
    }
}()

I added the above to a fork of your original gist, and it fixes the deadlock-until-timeout.

Since yamux only exposes a way for streams to indicate they will no longer send data, you can start a goroutine whose express purpose is to read from the server and detect that state. Calling Stream.{Close,Read,Write} concurrently this way is safe. Stream.Read returns an error (io.EOF in the case of a clean Stream.Close on the other end), and then the Stream.Close notifies the blocked Stream.Write that the stream has been locally closed (meaning closed for sending).

You can see Nomad implementing this pattern in places like the log streaming RPC.