hyperium / h2

HTTP 2.0 client & server implementation for Rust.
MIT License
1.34k stars 269 forks source link

fix: limit the amount of pending-accept reset streams #668

Closed seanmonstar closed 1 year ago

seanmonstar commented 1 year ago

Streams that have been received by the peer, but not accepted by the user, can also receive a RST_STREAM. This is a legitimate pattern: one could send a request and then shortly after, realize it is not needed, sending a CANCEL.

However, since those streams are now "closed", they don't count towards the max concurrent streams. So, they will sit in the accept queue, using memory.

In most cases, the user is calling accept in a loop, and they can accept requests that have been reset fast enough that this isn't an issue in practice.

But if the peer is able to flood the network faster than the server accept loop can run (simply accepting, not processing requests; that tends to happen in a separate task), the memory could grow.

So, this introduces a maximum count for streams in the pending-accept but remotely-reset state. If the maximum is reached, a GOAWAY frame with the error code of ENHANCE_YOUR_CALM is sent, and the connection marks itself as errored.

Closes https://github.com/hyperium/hyper/issues/2877 (we assume)

seanmonstar commented 1 year ago

I started to make the limit configurable, but then paused, wanting to get something working faster. If it seems worth the delay, I can add in the public API to set these limits yourself.

LPardue commented 1 year ago

Since this seems like something that occurs due to several dynamic factors or behaviours, a single hard coded limit on paper seems too restrictive. Not least because it isn't immediately obvious where the value 20 comes from. This might be an entirely reasonable value but without some wider testing, who knows? A default value that can be tweaked via API seems like a decent middleground to me.

seanmonstar commented 1 year ago

Added another commit making the options part of the public API.

Noah-Kennedy commented 1 year ago

I'm summarizing a conversation @seanmonstar and I had in Discord here:

There is a non-zero chance that the threshold we set does trip on real (and non-malicious) traffic. We have a log line in here for if this goes off, but it might be worthwhile for us to expose an actual metric for these GOAWAYs so that users of the library can better understand how frequently this trips.

seanmonstar commented 1 year ago

Given that I've been at this for 14 hours, I'm going to go knock out. Since the assumed problem is quite hard to trigger, and that the fix here is a guess, and that there's always some amount of risk when publishing new versions, I'm opting for leaving this open for the night.

Noah-Kennedy commented 1 year ago

This will also give our peers outside of North America a chance to look at this change before we release it.

If anyone has any concerns about this change, please make them heard.