hashicorp / yamux

Golang connection multiplexing library
Mozilla Public License 2.0
2.19k stars 232 forks source link

Fix failing partial read on last frame of closed session #95

Closed jacobvosmaer closed 2 years ago

jacobvosmaer commented 3 years ago

If the sending side of a stream closes both their stream and their session, a partial read of the final frame on the receiving side may fail because the receiving side tries to send a window update on a closed session.

This commit changes Stream.Read() so that it ignores "session shutdown" errors return by Stream.sendWindowUpdate(). This will let Read() calls consume all of the final frame.

hashicorp-cla commented 3 years ago

CLA assistant check
All committers have signed the CLA.

jacobvosmaer commented 2 years ago

If you look at the SPDY spec, on which yamux is based if I understand correctly, then you see in 2.6.8:

After the recipient reads in a data frame with FLAG_FIN that marks the end of the data stream, it should not send WINDOW_UPDATE frames as it consumes the last data frame.

If yamux did this too, we wouldn't have this bug. But yamux does keep sending window updates after FLAG_FIN. If the session is still alive, that is fine. But if the session has been closed on the sending side, then sending window updates from the receiving side will fail.

I can think of two reasons why the current implementation gets away with this bug:

  1. Window updates are elided if the receive buffer is less than half full
  2. Perhaps yamux is mostly used for long-lived sessions, in which case session shutdowns are rare, and session shutdown related bugs are rare too

Either way, I think this is a bug. It should be possible to cleanly have one side close the last stream and the last session, and have the other side read from the last stream to the end.

I have tried to fix the bug by fiddling with the stream state machine, but that got complicated. In part because it's not easy to predict what state we're in when the bug happens. So in the end I thought the most robust solution was to ignore ErrSessionShutdown when sending window updates.

jacobvosmaer commented 2 years ago

@banks I see you reviewed a recent PR on this project. Would you be able to look at this one, or direct me to someone else who can? Thanks!

kisunji commented 2 years ago

Not super familiar with yamux but my 2 cents:

It looks like the issue is more of an API one where (*Stream).Read can return n > 0 AND err != nil at the same time, which trips up conventional error handling like:

n, err := stream.Read(b)
if err != nil {
    // handle err
}
// do something with n bytes

This is actually somewhat common with the io.Reader interface where it recommends:

Callers should always process the n > 0 bytes returned before considering the error err.

The root cause of this error is due to sendWindowUpdate during a shutdown which as @jacobvosmaer mentioned is not conforming to SPDY but this PR is more of a band-aid solution to suppress ErrSessionShutdown and doesn't prevent the WindowUpdate from being sent.

I think there needs to be a bit more work to either:

  1. Fix the behaviour to be in-line with SPDY spec (even though we don't guarantee interoperability) which will prevent returning ErrSessionShutdown as a side-effect
  2. Get consensus on the "correctness" of returning n > 0 AND err != nil (i.e. is it important for the caller to know and handle ErrSessionShutdown?)
jacobvosmaer commented 2 years ago

Fix the behaviour to be in-line with SPDY spec (even though we don't guarantee interoperability) which will prevent returning ErrSessionShutdown as a side-effect

Yamux is not SPDY so I personally don't care if Yamux is SPDY compliant. I only mentioned that part of the SPDY spec to explain that "window updates after FLAG_FIN" are a known corner case.

It looks like the issue is more of an API one where (*Stream).Read can return n > 0 AND err != nil at the same time

I don't think n>0 is the problem. The problem is that we get an error value that is not EOF.

Expected behavior:

  1. Receive data into receive buffer
  2. Receive FLAG_FIN
  3. Read() returns (n, nil) ... (however many calls are necessary to drain the receive buffer)
  4. Read() returns (n, io.EOF)

Actual behavior (pre this MR):

  1. Receive data into receive buffer
  2. Receive FLAG_FIN
  3. Read() returns (n, nil) ... (however many calls are necessary to trigger a window update)
  4. Sending the window update fails
  5. Read() returns (n, ErrSessionShutdown)
  6. Caller aborts reading from stream, error bubbles up stack, unread data in receive buffer is lost
jefferai commented 2 years ago

Rather than setting the error to nil, why not set the error to io.EOF, possibly including additional handling of this a bit up the stack? That seems like the right thing to do given that there will indeed be no more data coming in.

jefferai commented 2 years ago

It looks like one consequence of using io.EOF is that it then causes the GoAway test to fail; instead of the client getting an ErrRemoteGoAway it gets nil.

jacobvosmaer commented 2 years ago

Rather than setting the error to nil, why not set the error to io.EOF

That would only be correct if Read() drains the receive buffer.

evanphx commented 2 years ago

The inability to send a windowUpdate doesn't imply that a future Read will fail due to buffering, ergo I think this PR is correct.

jefferai commented 2 years ago

Sounds good. Merging! And I'll tag after.