Open slingamn opened 2 months ago
Thanks for filing an issue! This is a tricky one!
I believe the existing code is correct, but I absolutely concede this is difficult to understand behavior. I think the key lies in 2.3.6 Stream Half-Close of the SPDY spec:
When one side of the stream sends a frame with the FLAG_FIN flag set, the stream is half-closed from that endpoint. The sender of the FLAG_FIN MUST NOT send further frames on that stream. When both sides have half-closed, the stream is closed.
In other words Stream.Close
is not semantically similar to TCPConn.Close
but rather to TCPConn.CloseWrite
.
To put it in protocol terms:
yamux/Stream.Close
or net/TCPConn.CloseWrite
net/TCPConn.Close
So in your example when the Server closes its stream, that does not imply the Client should consider the Stream closed. The Server stream closing only implies the Server will no longer send data.
This begs the question:
Should yamux differentiate Stream.Close
and Stream.CloseWrite
? SPDY has a RST_STREAM
control frame to abort streams, but I don't see that implemented in yamux.
Yamux only implements the RST flag on window update frames, but perhaps we should treat that like SPDY's? As far as I can tell the only place that Yamux RSTs individual Streams is when a timeout is hit.
There is a workaround you should use while we consider the above:
go func() {
_, err = cStream.Read([]byte{})
if err != nil {
log.Printf("Client.Stream.Read failed: %v", err)
cStream.Close()
}
}()
I added the above to a fork of your original gist, and it fixes the deadlock-until-timeout.
Since yamux only exposes a way for streams to indicate they will no longer send data, you can start a goroutine whose express purpose is to read from the server and detect that state. Calling Stream.{Close,Read,Write}
concurrently this way is safe. Stream.Read
returns an error (io.EOF
in the case of a clean Stream.Close
on the other end), and then the Stream.Close
notifies the blocked Stream.Write
that the stream has been locally closed (meaning closed for sending).
You can see Nomad implementing this pattern in places like the log streaming RPC.
I'm working on an application that uses yamux for proxying. I ran into a case where the application leaks goroutines. Here's a simplified test case: https://gist.github.com/slingamn/1dafab6141e03d27a3f51bcbbcdb9972
This test case sets up a client and a server. The client sends 10 messages to the server over a
*Stream
. The server reads 5 messages, then closes its side of the stream. After this happens, the client side continues writing messages untilsendWindow
is exhausted. Then it gets stuck here:https://github.com/hashicorp/yamux/blob/8bd691fb7605e7c7e8db7b552768b8f328cebe3e/stream.go#L227-L232
Since the other side has stopped reading, it will not send control messages that could unblock
sendNotifyCh
. So unless a write deadline has been set,Write()
blocks untilStreamCloseTimeout
(5 minutes by default).I naively tried to fix this by adding
streamRemoteClose
to the existing list of states (streamLocalClose
,streamClosed
) that cause writes to preemptively fail:https://github.com/hashicorp/yamux/blob/8bd691fb7605e7c7e8db7b552768b8f328cebe3e/stream.go#L180-L184
This broke
TestHalfClose
, from which I understand that the current behavior is expected, at least in part. But if this is expected, then I'm not sure what the in-band mechanism is for signaling that the other side of theStream
has gone away. Is there an existing API in yamux to tell the other side to stop sending? (For comparison,(*net.TCPConn).Close()
from the read side will cause writes to fail with EPIPE.)Thanks very much for your time.