Settings frames are not applied correctly

jwilm commented 11 months ago

We are running into an issue with a peer that sometimes modifies settings mid connection due to opening streams when max_concurrent_streams is lowered below currently open streams. In our h2 debug output, we noticed that the settings are not applied immediately. Keep in mind that we are the client in this example.

2023-07-11T19:24:44.754515Z DEBUG Connection{peer=Client}: h2::codec::framed_read: received frame=Settings { flags: (0x0), header_table_size: 4096, max_concurrent_streams: 1, max_frame_size: 16384, max_header_list_size: 8000 }
2023-07-11T19:24:44.754559Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Settings { flags: (0x1: ACK) }
2023-07-11T19:24:44.754584Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(65), flags: (0x4: END_HEADERS) }
2023-07-11T19:24:44.754906Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(65), flags: (0x1: END_STREAM) }
2023-07-11T19:24:44.757382Z DEBUG Connection{peer=Client}: h2::codec::framed_read: received frame=Settings { flags: (0x1: ACK) }
2023-07-11T19:24:44.757428Z DEBUG Connection{peer=Client}: h2::proto::settings: received settings ACK; applying Settings { flags: (0x0) }

According to RFC9113 Section 6.5.3,

The values in the SETTINGS frame MUST be processed in the order they appear, with no other frame processing between values. Unsupported settings MUST be ignored. Once all values have been processed, the recipient MUST immediately emit a SETTINGS frame with the ACK flag set. Upon receiving a SETTINGS frame with the ACK flag set, the sender of the altered settings can rely on the values from the oldest unacknowledged SETTINGS frame having been applied.

Which suggests the settings should be applied immediately upon receipt of the frame before sending an ack to the peer.

I'd be happy to implement a patch for this if someone is able to help identify the right approach. I see in src/proto/settings.rs where all of this logic is implemented, and I can imagine a couple of solutions, but it would be nice to align with the maintainers before doing the work.

Thanks!

jwilm commented 11 months ago

Actually I may have misunderstood the settings code upon further review, but there still seems to be something wrong here in that the max concurrent streams is reduced below the number of open streams, yet another stream is opened immediately after acking.

seanmonstar commented 11 months ago

Yea, I think the logs are probably easy to confuse: it's saying that the local settings are only applied once the remote has acknowledged them. The remote settings are applied as soon as they are received.

It's certainly possible there is a bug in changing the max allowed streams. We have some tests, but could be some cases we haven't exercised. I know it's work, but it's a great first step: could you add a unit test that triggers what you're seeing?

jwilm commented 11 months ago

Hey Sean, thanks for the response. I have the time available to work on a unit test, but I'm not particularly confident that it's easily reproducible in such a setting. The context in which this comes up for us tends to be while we are rapidly pushing requests onto an individual h2 connection when the setting frame arrives. Ie, feels a bit racey.

One theory I have is that there might just be something queued in the prioritize layer at the point the settings frame arrives, but because it's already in the send queue, there's no mechanism to stop it. Do you think it's possible to write unit tests to cover that sort of scenario without going through the client API?

seanmonstar commented 11 months ago

So I started putting together a unit test to try to trigger what you're talking about. I didn't get it all the way there, but looking at the logs, it does look close: it seems stream 5 is being sent, even though stream 3 is still streaming, and the server received the settings ack. I might have the asserts slightly wrong...

It's currently in this branch: https://github.com/hyperium/h2/compare/i704-max-concurrent-streams-mid-connection

jwilm commented 11 months ago

Oh wow, that test does seem to exhibit the behavior we're seeing. Thanks for putting that together! This looks like it's all under Connection::poll. First we see

│ │ │ ├─0ms TRACE h2::proto::connection recv SETTINGS, frame=Settings { flags: (0x0), max_concurrent_streams: 1 }
│ │ │ ├─┐h2::proto::connection::poll_ready
│ │ │ │ ├─┐h2::codec::framed_write::FramedWrite::buffer frame=Settings { flags: (0x1: ACK) }

Which is acking the new settings. Later the logs show

│ │ │ ├─0ms TRACE h2::proto::streams::prioritize writing, frame=Headers { stream_id: StreamId(3), flags: (0x5: END_HEADERS | END_STREAM) }
│ │ │ ├─┐h2::codec::framed_write::FramedWrite::buffer frame=Headers { stream_id: StreamId(3), flags: (0x5: END_HEADERS | END_STREAM) }
│ │ │ │ ├─0ms DEBUG h2::codec::framed_write send, frame=Headers { stream_id: StreamId(3), flags: (0x5: END_HEADERS | END_STREAM) }

where the new streams are sent. The TRACE log here is particularly insightful; it looks like these are under the pending_send queue in prioritize which means there are no further checks to the max_send_streams setting (those checks only happen when going from pending_open to pending_send, afaict).

Here's my analysis of the situation. I'm rather new to the code base so sorry if there are any big gaps or incorrect conclusions.

I'm looking at the Prioritize implementation, and there's kind of a lot going on here. There's two queues of interest

pending_open, which is the set of streams waiting to transition to open state / send headers
pending_send, which is supposed to be streams with frames ready and just waiting on connection capacity

Once streams transition to pending_send, it is assumed that the connection can accept whatever is there whether that's additional data frames, or in our case, a new stream to open. Because there's no checks on that queue for max_send_streams, new streams get opened when they shouldn't.

It seems to me we either need to add checks to pending_send to make sure there is still new stream capacity, or we need to be more conservative about adding new streams to pending_send. The former option seems more complicated because now you don't have a simple FIFO situation, so I think a solution should focus on the latter of only popping from pending_open when header frames may immediately be written to the Codec (ie, eliminate the chance of desync between max_send_streams and the pending_send queue depth of new streams).

With that in mind, my attention is brought to two places.

Prioritize::pop_frame which itself does two things: a. Calls schedule_pending_open which moves as many streams from pending_open to pending_send as there is currently capacity for. b. Pops frames from pending_send to write to the codec
Send::send_headers which appears to add the stream to both pending_open and pending_send.

Ultimately I think there's a couple of changes needed here to enforce the invariant that pending_send always has frames that are allowed by the h2 protocl to be written:

schedule_pending_open should yield precisely one stream to try and open, and it should immediately write the headers to the codec.
send_headers should only push to pending_open

Do you agree with this analysis? Curious for your take.

jwilm commented 11 months ago

Ah one other note, it looks like the way these frames are making it into pending_send is via the send_headers method which is ultimately called via SendRequest::send_request.

jwilm commented 11 months ago

Building on your test, I was able to make a patch that seems to address the issue we are seeing. However, because the test suite in general doesn't do client.ready().await, several tests start failing with this patch.

diff --git a/src/proto/streams/send.rs b/src/proto/streams/send.rs
index dcb5225..1e5e4be 100644
--- a/src/proto/streams/send.rs
+++ b/src/proto/streams/send.rs
@@ -143,21 +143,17 @@ impl Send {
         // Update the state
         stream.state.send_open(end_stream)?;

-        if counts.peer().is_local_init(frame.stream_id()) {
-            // If we're waiting on a PushPromise anyway
-            // handle potentially queueing the stream at that point
-            if !stream.is_pending_push {
-                if counts.can_inc_num_send_streams() {
-                    counts.inc_num_send_streams(stream);
-                } else {
-                    self.prioritize.queue_open(stream);
-                }
-            }
-        }
+        if counts.peer().is_local_init(frame.stream_id()) && !stream.is_pending_push {
+            stream
+                .pending_send
+                .push_back(buffer, Frame::<B>::from(frame));

-        // Queue the frame for sending
-        self.prioritize
-            .queue_frame(frame.into(), buffer, stream, task);
+            self.prioritize.queue_open(stream);
+        } else {
+            // Queue the frame for sending
+            self.prioritize
+                .queue_frame(frame.into(), buffer, stream, task);
+        }

         Ok(())
     }
diff --git a/tests/h2-tests/tests/client_request.rs b/tests/h2-tests/tests/client_request.rs
index b4fb847..7939b75 100644
--- a/tests/h2-tests/tests/client_request.rs
+++ b/tests/h2-tests/tests/client_request.rs
@@ -303,7 +303,8 @@ async fn recv_decrement_max_concurrent_streams_when_requests_queued() {
         srv.ping_pong([0; 8]).await;

         // limit this server later in life
-        srv.send_frame(frames::settings().max_concurrent_streams(1)).await;
+        srv.send_frame(frames::settings().max_concurrent_streams(1))
+            .await;
         srv.recv_frame(frames::settings_ack()).await;

         srv.recv_frame(
@@ -312,6 +313,14 @@ async fn recv_decrement_max_concurrent_streams_when_requests_queued() {
                 .eos(),
         )
         .await;
+        srv.send_frame(frames::headers(3).response(200).eos()).await;
+        srv.recv_frame(
+            frames::headers(5)
+                .request("POST", "https://example.com/")
+                .eos(),
+        )
+        .await;
+        srv.send_frame(frames::headers(5).response(200).eos()).await;
         srv.ping_pong([1; 8]).await;
     };

@@ -335,6 +344,7 @@ async fn recv_decrement_max_concurrent_streams_when_requests_queued() {
             .unwrap();

         // first request is allowed
+        let mut client = h2.drive(async move { client.ready().await.unwrap() }).await;
         let (resp1, _) = client.send_request(request, true).unwrap();

         let request = Request::builder()
@@ -344,6 +354,7 @@ async fn recv_decrement_max_concurrent_streams_when_requests_queued() {
             .unwrap();

         // second request is put into pending_open
+        let mut client = h2.drive(async move { client.ready().await.unwrap() }).await;
         let (resp2, _) = client.send_request(request, true).unwrap();

         /*
@@ -363,7 +374,6 @@ async fn recv_decrement_max_concurrent_streams_when_requests_queued() {
         assert_eq!(err.to_string(), "user error: rejected");
         */

-
         h2.drive(async move {
             resp1.await.expect("req");
         })

The patch's intent is to make all requests go through the Prioritize layer's pending open queue so that the max concurrent streams check is done near send time.

To keep the semantics of before which seemed to have internal queuing, we'd need to lift the restriction that the SendRequest handle only holds one pending-open stream at a time and instead let them buffer in the Prioritize layer up to a point. Such a change is probably going to be necessary anyway to ensure reasonable throughput with the change to send_headers.

Note that this patch isn't sufficient to fully address the problem. The logic in Prioritize which moves streams from pending_open to pending_send is too eager and could again result in opening new streams over max concurrent.

I seem to be grokking the test suite and the code a bit better, and I will take a shot at addressing the other piece in prioritize later.

seanmonstar commented 10 months ago

I've applied your patch, read through the logs, and now I see what you mean. Clients are supposed to check that the previous request was actually sent before trying to send another one, but in practice since there is often space, many tests weren't actually checking.

We could fix up all the tests and say "well it was documented", but it likely will still surprise people if they did the same thing. Well, hm. (Stream of consciousness) I suppose people would only run into it if they were hitting a max stream limit, but in most cases, they should be able to work just fine, right? This patch would just be pushing back pressure back to the user faster, when the peer had signaled to slow down. I'm going to try exploring a little more on the broken tests.

jwilm commented 10 months ago

In my current WIP fix for the issue, I've added a next_send_stream_will_reach_capacity() method on Counts which the SendRequest handle can use as a signal to hold onto the stream and wait to be notified. This leaves a backpressure mechanism in place while leaving the decision whether to actually open the streams up to Prioritize.

The piece I'm working on now is fixing schedule_pending_open to be effectively pop_pending_open and then to shove the new stream at the front of the pending_send queue. Only one stream can be safely popped at a time when we know the Codec is ready for a write and that a Headers frame won't sit in the pending_send queue.

I've somewhat given up on writing a test to specifically exercise this failure mode for the moment. I was unsure whether it's actually possible to simulate headers frames being in the pending_send queue of Prioritize, while streams are open, and when the server sends the settings frame reducing max stream count past the current count. Specifically, I was unsure whether the given abstractions have too "broad" of synchronization between client processing and server processing to catch the issue.

jwilm commented 10 months ago

The patch might still be rough around the edges but I wanted to share. Here's my current patch for this issue. Need to spend some time looking at tests now. The commit message explains in detail what I believe the problem to be and the approach as well as one alternative I had considered. Should I open a PR with this work for further discussion?

seanmonstar commented 10 months ago

I've somewhat given up on writing a test to specifically exercise this failure mode for the moment.

Yea, I never expected it would be easy to consistently do. It depends on internal implementation details... But the one I threw together was useful to trigger what it could on-demand. :shrug:

Should I open a PR with this work for further discussion?

Certainly! I'll be back to full steam next week.

seanmonstar commented 10 months ago

Published v0.3.21 with the fix for this just now!

hyperium / h2

Settings frames are not applied correctly #704