spiffe / spire

The SPIFFE Runtime Environment
https://spiffe.io
Apache License 2.0
1.69k stars 455 forks source link

Hardening the Agent - Rate Limiting UDS #2010

Closed jkevlin closed 2 weeks ago

jkevlin commented 3 years ago

The possibility of a DoS attack on an Agent UDS is a known issue. Rate limiting the connections to the UDS socket would be a possible solution to preventing a DoS attack through the UDS vector.

jkevlin commented 3 years ago

Research: https://godoc.org/golang.org/x/time/rate Middle ware skeletons https://github.com/grpc-ecosystem/go-grpc-middleware/tree/master/ratelimit https://stackoverflow.com/a/62932668 Token bucket implementation https://github.com/juju/ratelimit Leaky-bucket https://github.com/uber-go/ratelimit Incomplete rate per client code https://dev.to/plutov/rate-limiting-http-requests-in-go-based-on-ip-address-542g https://hustcat.github.io/rate-limit-example-in-go/ https://cloud.google.com/solutions/rate-limiting-strategies-techniques#techniques-enforcing-rate-limits

Spire Code: Current Server ratelimiter https://github.com/spiffe/spire/blob/master/pkg/server/api/middleware/ratelimit.go

evan2645 commented 3 years ago

Thank you for opening this @jkevlin

We need to put some thought into how we key the ratelimiting logic. For example, if we ratelimit based on PID, then nothing stops the abuser from continually forking and hitting the API. Alternatively, we could ratelimit by user ID, however I'm not sure if this would have other implications (are there legitimate use cases that involve a large number of workloads running as the same uid on the same host?).

Since we need to apply the ratelimiting prior to exercising all of our attestation logic, I think our options are limited, and may even be platform-specific (a function of what all we can get inside peertracker).

jkevlin commented 3 years ago

I kind of assumed that the rateLimiter would be an injected dependency configured in config. Have 2 or 3 options, including no limit and all be configurable.

rturner3 commented 3 years ago

One issue we've encountered related to this is when consumers of SPIRE Agent write their integrations incorrectly to recreate clients/connections for each use or on retry. This can result in clients opening and retaining lots of file descriptors for the connections to the Agent UDS which affects overall health of SPIRE Agent and the host. It hasn't always been easy for these consumers to trace the symptom of a steady increase in open file descriptors on the host to a faulty integration with SPIRE Agent.

Returning some more descriptive RPC errors around rate limiting or overall number of connections per client might help identify these bad integrations more easily, in addition to the hardening benefits already described in this issue.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 365 days with no activity.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been inactive for 30 days since being marked as stale.