python-hyper / h2

HTTP/2 State-Machine based protocol implementation
https://h2.readthedocs.io/en/stable
MIT License
963 stars 151 forks source link

streams: Prevent RST_STREAM from being sent multiple times #1267

Open MadMockers opened 2 years ago

MadMockers commented 2 years ago

Previously, RST_STREAM would be sent even when the stream closed by state was in SEND_RST_STREAM. This fix instead checks which peer closed the stream initially, and then updates the closed by value from RECV_RST_STREAM to SEND_RST_STREAM after a RST_STREAM frame has been sent on a previously reset stream.

Streams that have a closed by value of SEND_RST_STREAM now ignore all frames.

Kriechi commented 2 years ago

@MadMockers thanks for sending this fix! I haven't seen any bug reports on this before, did you find any errors or incompatibilities with specific clients, or did you follow the RFC word by word? I'm curious to see how this change affects the existing tests - and if we need any new ones to cover the new behaviour?

MadMockers commented 2 years ago

@MadMockers thanks for sending this fix! I haven't seen any bug reports on this before, did you find any errors or incompatibilities with specific clients

Hey mate - Yes I had issues with dotnet gRPC, which uses the Kestral HTTP/2 implementation. There's an additional bug in the Kestral implementation that causes a connection error when a second RST_STREAM is received (The correct action for dotnet is to ignore the frame). The bug in Kestral was resolved here (https://github.com/dotnet/aspnetcore/commit/1db20afa9e2dd47cfc1a935de080f74cebb9ba0b), however it remains in dotnet 3.1 (an LTS release).

I've seen connection errors when using the python grpclib implementation, which uses h2, to talk with the dotnet gRPC server. This issue (coupled with the additional dotnet issue) causes all active gRPC requests to be aborted due to dotnet tearing down the underlying connection. Unfortunately I've found it hard to reproduce this issue with logging as it's racy, however I've manually just updated the file in my venv and am no longer observing it.

I'm curious to see how this change affects the existing tests - and if we need any new ones to cover the new behaviour?

I haven't run any of the tests yet, sorry! An additional test would be to have a peer always respond to a RST_STREAM with a RST_STREAM. The correct action is that the response is ignored. The RFC allows for this, as 2 RST_STREAM frames in different directions may be in flight at the same time.

I haven't looked at the tests yet, but intend to add one.

MadMockers commented 2 years ago

A test for this scenario does already exist, I've added an additional commit which asserts correct behaviour.

MadMockers commented 2 years ago

Updated with tests and additional coverage (please see CI results here https://github.com/MadMockers/h2/pull/5)

I should note that in the process of doing this I did come across a contradiction in the RFC that may be worth discussing.

From Section 5.1 (closed state):

An endpoint MUST NOT send frames other than PRIORITY on a closed stream. An endpoint that receives any frame other than PRIORITY after receiving a RST_STREAM MUST treat that as a stream error (Section 5.4.2) of type STREAM_CLOSED.

The referenced Section 5.4.2:

An endpoint that detects a stream error sends a RST_STREAM frame (Section 6.4) that contains the stream identifier of the stream where the error occurred. The RST_STREAM frame includes an error code that indicates the type of error.

Essentially we're being told we MUST NOT send any frames, but we MUST treat this as a stream error (which involves sending a frame).

I think the common sense interpretation would be that perhaps the RFC should have stated that an endpoint can send both PRIORITY AND RST_STREAM in the closed state. Or maybe the RFC does not consider RST_STREAM frames to be "on" a stream. Either way, I bring this up as I relied on it for the logic for not sending more than one RST_STREAM frame (first one transitions to closed state, making additional frames illegal).

There's other parts however which re-enforce that a RST_STREAM frame should only be sent once.

Again from Section 5.1 (closed state):

An endpoint MUST ignore frames that it receives on closed streams after it has sent a RST_STREAM frame.

Once a RST_STREAM has been sent, it would only be generated again from receiving an additional frame. If additional frames are being ignored, then there should only be the initial RST_STREAM frame.

However there is a contradiction, again in Section 5.4.2:

Normally, an endpoint SHOULD NOT send more than one RST_STREAM frame for any stream. However, an endpoint MAY send additional RST_STREAM frames if it receives frames on a closed stream after more than a round-trip time. This behavior is permitted to deal with misbehaving implementations.

Taken in isolation, the SHOULD NOT portion would imply that the old operation prior to this fix is not invalid (SHOULD NOT != MUST NOT). At a minimum this patch removes behaviour defined as SHOULD NOT. Additionally, as h2 doesn't do round-trip time tracking (that I'm aware of - may have missed this!!!), I think this clause should be taken as a MUST NOT when not doing the time tracking.

Lukasa commented 2 years ago

I think the common sense interpretation would be that perhaps the RFC should have stated that an endpoint can send both PRIORITY AND RST_STREAM in the closed state. Or maybe the RFC does not consider RST_STREAM frames to be "on" a stream. Either way, I bring this up as I relied on it for the logic for not sending more than one RST_STREAM frame (first one transitions to closed state, making additional frames illegal).

This contradiction has been resolved in the new version of the document.

The spec deliberately allows sending multiple RST_STREAM frames to account for the possibility that the peer implementation is buggy: if it is still sending frames on a stream after one RTT from receiving an RST_STREAM frame then the peer implementation is clearly confused and has mishandled the frame. However, the new guidance (that this is a connection error, not a stream error) is probably the best mode of handling this.

MadMockers commented 2 years ago

This contradiction has been resolved in the new version of the document.

Nice! Just looking through the updated version, it seems the main contradiction is resolved. I'm not sure now when a second RST_STREAM would ever be sent when strictly following the 5.1 state section. Previously, it was allowed to be sent due to:

An endpoint that receives any frame other than PRIORITY after receiving a RST_STREAM MUST treat that as a stream error (Section 5.4.2) of type STREAM_CLOSED.

As this portion has now been removed, there doesn't seem to be any state in which a second RST_STREAM is called for.

My reading of the updated closed state is that an endpoint can either ignore frames, or treat it as a connection error. Notably, the updated RFC still contains the following:

An endpoint MUST NOT send frames other than PRIORITY on a closed stream.


The spec deliberately allows sending multiple RST_STREAM frames to account for the possibility that the peer implementation is buggy: if it is still sending frames on a stream after one RTT from receiving an RST_STREAM frame then the peer implementation is clearly confused and has mishandled the frame.

Section 5.4.2. Stream Error Handling still seems to be in contradiction with the updated closed state. Is it possible the following clause was unintentionally left in the updated version?

Normally, an endpoint SHOULD NOT send more than one RST_STREAM frame for any stream. However, an endpoint MAY send additional RST_STREAM frames if it receives frames on a closed stream after more than a round-trip time. This behavior is permitted to deal with misbehaving implementations.


However, the new guidance (that this is a connection error, not a stream error) is probably the best mode of handling this.

This would resolve the issue I'm seeing with the Kestral HTTP2 implementation. That being said, there may be unintended compatibility consequences with other buggy peers. Previously, h2 would treat this as a stream error (sending an additional RST_STREAM), which should be ignored by the peer. Treating it as a connection error instead of ignoring the frame could noticeably change interoperability.

It may be worth looking at implementing the following suggestion from the updated RFC:

Endpoints can use frames that indicate that the peer has received the closing signal to drive this. Endpoints SHOULD NOT use timers for this purpose. For example, an endpoint that sends a SETTINGS frame after closing a stream can safely treat receipt of a DATA frame on that stream as an error after receiving an acknowledgment of the settings. Other things that might be used are PING frames, receiving data on streams that were created after closing the stream, or responses to requests created after closing the stream.

An additional thought which may be deemed out-of-scope is potentially a "strict" mode or a "quirks" mode (understanding that there is legitimate argument against additional code paths to account for these modes).