Open MadMockers opened 2 years ago
@MadMockers thanks for sending this fix! I haven't seen any bug reports on this before, did you find any errors or incompatibilities with specific clients, or did you follow the RFC word by word? I'm curious to see how this change affects the existing tests - and if we need any new ones to cover the new behaviour?
@MadMockers thanks for sending this fix! I haven't seen any bug reports on this before, did you find any errors or incompatibilities with specific clients
Hey mate - Yes I had issues with dotnet gRPC, which uses the Kestral HTTP/2 implementation. There's an additional bug in the Kestral implementation that causes a connection error when a second RST_STREAM is received (The correct action for dotnet is to ignore the frame). The bug in Kestral was resolved here (https://github.com/dotnet/aspnetcore/commit/1db20afa9e2dd47cfc1a935de080f74cebb9ba0b), however it remains in dotnet 3.1 (an LTS release).
I've seen connection errors when using the python grpclib implementation, which uses h2, to talk with the dotnet gRPC server. This issue (coupled with the additional dotnet issue) causes all active gRPC requests to be aborted due to dotnet tearing down the underlying connection. Unfortunately I've found it hard to reproduce this issue with logging as it's racy, however I've manually just updated the file in my venv and am no longer observing it.
I'm curious to see how this change affects the existing tests - and if we need any new ones to cover the new behaviour?
I haven't run any of the tests yet, sorry! An additional test would be to have a peer always respond to a RST_STREAM
with a RST_STREAM
. The correct action is that the response is ignored. The RFC allows for this, as 2 RST_STREAM
frames in different directions may be in flight at the same time.
I haven't looked at the tests yet, but intend to add one.
A test for this scenario does already exist, I've added an additional commit which asserts correct behaviour.
Updated with tests and additional coverage (please see CI results here https://github.com/MadMockers/h2/pull/5)
I should note that in the process of doing this I did come across a contradiction in the RFC that may be worth discussing.
From Section 5.1 (closed state):
An endpoint MUST NOT send frames other than PRIORITY on a closed stream. An endpoint that receives any frame other than PRIORITY after receiving a RST_STREAM MUST treat that as a stream error (Section 5.4.2) of type STREAM_CLOSED.
The referenced Section 5.4.2:
An endpoint that detects a stream error sends a RST_STREAM frame (Section 6.4) that contains the stream identifier of the stream where the error occurred. The RST_STREAM frame includes an error code that indicates the type of error.
Essentially we're being told we MUST NOT send any frames, but we MUST treat this as a stream error (which involves sending a frame).
I think the common sense interpretation would be that perhaps the RFC should have stated that an endpoint can send both PRIORITY
AND RST_STREAM
in the closed state. Or maybe the RFC does not consider RST_STREAM
frames to be "on" a stream. Either way, I bring this up as I relied on it for the logic for not sending more than one RST_STREAM
frame (first one transitions to closed
state, making additional frames illegal).
There's other parts however which re-enforce that a RST_STREAM
frame should only be sent once.
Again from Section 5.1 (closed state):
An endpoint MUST ignore frames that it receives on closed streams after it has sent a RST_STREAM frame.
Once a RST_STREAM
has been sent, it would only be generated again from receiving an additional frame. If additional frames are being ignored, then there should only be the initial RST_STREAM
frame.
However there is a contradiction, again in Section 5.4.2:
Normally, an endpoint SHOULD NOT send more than one RST_STREAM frame for any stream. However, an endpoint MAY send additional RST_STREAM frames if it receives frames on a closed stream after more than a round-trip time. This behavior is permitted to deal with misbehaving implementations.
Taken in isolation, the SHOULD NOT portion would imply that the old operation prior to this fix is not invalid (SHOULD NOT != MUST NOT). At a minimum this patch removes behaviour defined as SHOULD NOT. Additionally, as h2 doesn't do round-trip time tracking (that I'm aware of - may have missed this!!!), I think this clause should be taken as a MUST NOT when not doing the time tracking.
I think the common sense interpretation would be that perhaps the RFC should have stated that an endpoint can send both PRIORITY AND RST_STREAM in the closed state. Or maybe the RFC does not consider RST_STREAM frames to be "on" a stream. Either way, I bring this up as I relied on it for the logic for not sending more than one RST_STREAM frame (first one transitions to closed state, making additional frames illegal).
This contradiction has been resolved in the new version of the document.
The spec deliberately allows sending multiple RST_STREAM frames to account for the possibility that the peer implementation is buggy: if it is still sending frames on a stream after one RTT from receiving an RST_STREAM frame then the peer implementation is clearly confused and has mishandled the frame. However, the new guidance (that this is a connection error, not a stream error) is probably the best mode of handling this.
This contradiction has been resolved in the new version of the document.
Nice! Just looking through the updated version, it seems the main contradiction is resolved. I'm not sure now when a second RST_STREAM
would ever be sent when strictly following the 5.1 state section. Previously, it was allowed to be sent due to:
An endpoint that receives any frame other than PRIORITY after receiving a RST_STREAM MUST treat that as a stream error (Section 5.4.2) of type STREAM_CLOSED.
As this portion has now been removed, there doesn't seem to be any state in which a second RST_STREAM
is called for.
My reading of the updated closed
state is that an endpoint can either ignore frames, or treat it as a connection error. Notably, the updated RFC still contains the following:
An endpoint MUST NOT send frames other than PRIORITY on a closed stream.
The spec deliberately allows sending multiple RST_STREAM frames to account for the possibility that the peer implementation is buggy: if it is still sending frames on a stream after one RTT from receiving an RST_STREAM frame then the peer implementation is clearly confused and has mishandled the frame.
Section 5.4.2. Stream Error Handling still seems to be in contradiction with the updated closed
state. Is it possible the following clause was unintentionally left in the updated version?
Normally, an endpoint SHOULD NOT send more than one RST_STREAM frame for any stream. However, an endpoint MAY send additional RST_STREAM frames if it receives frames on a closed stream after more than a round-trip time. This behavior is permitted to deal with misbehaving implementations.
However, the new guidance (that this is a connection error, not a stream error) is probably the best mode of handling this.
This would resolve the issue I'm seeing with the Kestral HTTP2 implementation. That being said, there may be unintended compatibility consequences with other buggy peers. Previously, h2
would treat this as a stream error (sending an additional RST_STREAM
), which should be ignored by the peer. Treating it as a connection error instead of ignoring the frame could noticeably change interoperability.
It may be worth looking at implementing the following suggestion from the updated RFC:
Endpoints can use frames that indicate that the peer has received the closing signal to drive this. Endpoints SHOULD NOT use timers for this purpose. For example, an endpoint that sends a SETTINGS frame after closing a stream can safely treat receipt of a DATA frame on that stream as an error after receiving an acknowledgment of the settings. Other things that might be used are PING frames, receiving data on streams that were created after closing the stream, or responses to requests created after closing the stream.
An additional thought which may be deemed out-of-scope is potentially a "strict" mode or a "quirks" mode (understanding that there is legitimate argument against additional code paths to account for these modes).
Previously, RST_STREAM would be sent even when the stream closed by state was in SEND_RST_STREAM. This fix instead checks which peer closed the stream initially, and then updates the closed by value from RECV_RST_STREAM to SEND_RST_STREAM after a RST_STREAM frame has been sent on a previously reset stream.
Streams that have a closed by value of SEND_RST_STREAM now ignore all frames.