Make ping rate limit configurable

edsko commented 7 months ago

I'm dealing with some gRPC servers (written in golang) that routine exceed the limit of 4 pings/sec (increasing the limit to 10 seems to do the trick). This PR makes the ping rate limit configurable.

edsko commented 7 months ago

@kazu-yamamoto , you asked a difficult question :sweat_smile:

If you think 4 is too small, you can increase the value.

I didn't change the default as I wasn't sure what to change it to, and I wasn't comfortable suggesting something. But since you asked, I spent some time trying to see what other people do. Unfortunately, it is surprisingly difficult to find some concrete guidelines about this. Plenty of sites that tell you "one way to mitigate a ping flood attack is to impose rate limits", but what those rate limits should be, nobody seems to want to say.

A summary of what I found:

Alternative approaches

It seems most applications impose rate limits in a different manner than http2 currently does. Some examples:

Limit on the maximum number of control frames per connection (rather than per second). Examples:
- https://avinetworks.com/docs/latest/netflix-http2-dos-advisory/ The Avi REST API introduces a parameter max_http2_control_frames_per_connection, which seems to have been adopted verbatim also by some other suppliers (for example, VMWare). Defaults to 1,000.
- nginx hardcodes hardcodes this limit to a maximum of 10,000 (patch that introduces the limit, location in the source code)
- A10 Networks has a configurable limit here, also defaulting to 10,000.
Compare overall traffic to traffic on the individual streams.
- For example, in nginx: "as long as total traffic is many times larger than stream traffic, we consider this to be a flood" (patch).
Limit outbound control packets (the response to the ping, as opposed to the ping itself).
- Some examples of this are summarized in https://github.com/dotnet/runtime/pull/52862#issuecomment-848159708
Impose a minimum time between control packets
- The same comment that summarizes (3) above suggests this, referring to https://github.com/dotnet/runtime/pull/52862#issuecomment-846329808 as an inspiration, with a minimum ping interval of 10 seconds.
Most relevant for my own needs, since I am working on a gRPC library: the gRPC Keepalive guide defines a server parameter PERMIT_KEEPALIVE_TIME which sets the "minimum allowed time between a server receiving successive ping frames without sending any data/header frame" and defaults to 5 minutes -- so this is some hybrid of (2) and (4).

Actual ping rate limiting

For actual rate limiting, I only found two suggestions:

Not HTTP/2 specific, but Hillstone imposes a rate limit on all ICMP packets, defaulting to a rather liberal 1500/second.
The most directly relevant one I found is the Apache traffic server, which limits pings to 60 pings/minute by default (that is 60 per minute, not per second).

Underlying cause of the many pings

I suspect that the reason I am seeing more pings than the 4/second that http2 currently allows is somehow related to the rate of requests (for example -- speculating -- perhaps some logic is a bit broken and new requests can cause a new keepalive ping to be sent, irrespective of when the previous one was sent). If true, that would mean that any specific value of pings/second is potentially problematic. I'd have to start digging through the golong source of the server to know for sure, but it at least feels plausible as a client "broken" in this way would still work with the recommendation from the guide on rate limiting (option (5), above) -- even if it doesn't quite explain everything, at least not when compared to the guidelines (but that is no surprise in the gRPC ecosystem, unfortunately) -- it certainly doesn't seem to match the documented behaviour of the golang client :shrug:

Conclusion

I'm not sure what to suggest. Quite frankly, I made the ping rate limit a parameter in my library too, defaulting to 10/second for now, but of course that is basically passing the bucket further down the line to the application :grin: I agree with you that sensible defaults would be better.

Option (5) from the gRPC guidelines -- that is, imposing a minimum time between successive ping frames without sending data/header frames -- does seem quite sensible, not just for gRPC but also for other applications. If you want, I can try to implement that and see if it works for me, although of course that would be a bigger change to http2 than the current PR.

Alternatively, I could imagine some kind of pluggable "rate limiting API" which could be instantiated to any of the 5 options above; I'd have to spend some time thinking about what such an API should look like.

Let me know what you'd like me to do.

kazu-yamamoto commented 7 months ago

@edsko Thank you for this research!

Perhaps, it is easy for many programmers to measure time between packets and to count the number of a specific frame but it is hard to implement rate limiting (number of packets per second). It took some days for me to hit upon the idea on how to implement this rate limiting. Yes, it looks very easy if you read the code but it would be hard to implement it without reference code.

I'm not sure it's worth exploring other approaches. I would suggest to just increase the value from 4 to 10 with the configurable scheme.

edsko commented 7 months ago

Ok, I've discussed this with my client and we're okay with this approach. I've force-pushed a commit that changes the default to 10 (and still keeps it configurable).

edsko commented 7 months ago

Also changed the test suite so that the ping protection test is a bit more aggressive so that it still exceeds the limit.

kazu-yamamoto commented 7 months ago

Merged. Thank you for your contributions! A new version has been released.

edsko commented 7 months ago

Thanks @kazu-yamamoto !

kazu-yamamoto / http2