hyperium / h2

HTTP 2.0 client & server implementation for Rust.
MIT License
1.34k stars 266 forks source link

Fix opening new streams over max_concurrent_streams #706

Closed jwilm closed 10 months ago

jwilm commented 10 months ago

From the main commit message:

There was a bug where opening streams over the max concurrent streams was possible if max_concurrent_streams were lowered beyond the current number of open streams and there were already new streams adding to the pending_send queue.

There was two mechanisms for streams to end up in that queue.

  1. send_headers would push directly onto pending_send when below max_concurrent_streams
  2. prioritize would pop from pending_open until max_concurrent_streams was reached.

For case 1, a settings frame could be received after pushing many streams onto pending_send and before the socket was ready to write again. For case 2, the pending_send queue could have Headers frames queued going into a Not Ready state with the socket, a settings frame could be received, and then the headers would be written anyway after the ack.

The fix is therefore also two fold. Fixing case 1 is as simple as letting Prioritize decide when to transition streams from pending_open to pending_send since only it knows the readiness of the socket and whether the headers can be written immediately. This is slightly complicated by the fact that previously SendRequest would block when streams would be added as "pending open". That was addressed by guessing when to block based on max concurrent streams rather than the stream state.

The fix for Prioritize was to conservatively pop streams from pending_open when the socket is immediately available for writing a headers frame. This required a change to queuing to support pushing on the front of pending_send to ensure headers frames don't linger in pending_send.

The alternative to this was adding a check to pending_send whether a new stream would exceed max concurrent. In that case, headers frames would need to carefully be reenqueued. This seemed to impose more complexity to ensure ordering of stream IDs would be maintained.


I don't think this is ready to merge yet, but I'd like to start gathering some feedback to get it polished up. The biggest issue with this patch at the moment is that SendRequest handles will be blocked for longer than they have in the patch. This behavior only comes up when reaching max_concurrent_streams. To fix this issue with the patch, I think we'd need an alternative way to notify waiting SendRequest handles than using the last stream to trigger the wait.

Fixes #704

seanmonstar commented 10 months ago

This makes sense, thanks a lot for the write-up, it helped me catch back up. I've opened #707 where I can build on this and push a few more commits to make things work.

The biggest issue with this patch at the moment is that SendRequest handles will be blocked for longer than they have in the pa[st].

I've given this some thought over the past week, and I think this is fine. I think this is actually what is meant by the documentation saying "The user must not send a request if poll_ready does not return Ready". hyper at least always checks poll_ready first. Making this change might find other libraries that were failing to do so, but that'd also find that they didn't fully handle backpressure.

jwilm commented 10 months ago

I've opened https://github.com/hyperium/h2/pull/707 where I can build on this and push a few more commits to make things work.

Thanks! Please let me know if there's anything I can do to help.

seanmonstar commented 10 months ago

I got the various straggling tests to pass without too much change needed. It's looking green. Does this seem to fix up what you were seeing?

jwilm commented 10 months ago

Yes. We've been running the patch in prod for about 5 days without issue. As a side note, we also use tonic/grpc pretty extensively; we applied the patch using [patch.crates-io] so it also applied to those dependencies. Everything is working as expected.