mholt / caddy-ratelimit

HTTP rate limiting module for Caddy 2
Apache License 2.0
254 stars 17 forks source link

Fix race when initializing ringBufferRateLimiter #43

Closed inahga closed 8 months ago

inahga commented 8 months ago

This closes https://github.com/mholt/caddy-ratelimit/issues/36.

Fixes a race condition between ringBufferRateLimiter creation and its insertion into a map. Do this by locking the entire map when we get or insert a ringBufferRateLimiter.

I have replaced use of sync.Map with a normal map[string]*ringBufferRateLimiter and a sync.Mutex. They are passed around with a rateLimitersMap struct. I've factored out logic into methods of rateLimitersMap, which enables some careful use of defer rlm.mu.Unlock() to avoid leaving a lock held open on panic().

We didn't see a need for a sync.Map. The docs suggest against using it for type safety, and none of the suggested use cases apply. https://pkg.go.dev/sync#Map. Let me know if I'm misunderstanding the use case (very possible!).

I've removed the sync.Pool, for now. Since ringBufferRateLimiter creation and insertion is fully synchronized, I didn't see a need for it.

Note that some of the defensive refactoring is not strictly required--I have a change that preserves the existing data structures, but I think the suggested changeset is an overall improvement in maintainability. https://github.com/divviup/caddy-ratelimit/pull/1/commits/65ad951ea012a5410dff297efa9da6f769e20dc0.

Some discussion of the performance impact and profiles is here https://github.com/divviup/caddy-ratelimit/pull/1. TL;DR, no meaningful impact to CPU, memory, or latency. This implementation could be optimized by replacing the normal mutex with a RWMutex, but it would be a marginal improvement (if any) in exchange for much more complicated locking semantics.

I suggest waiting before merging this, while we let it sit in production for a while longer, to ensure the problem is fully fixed. I'll let it sit for about a week. In the mean time, this is ready for code review.

mholt commented 8 months ago

Hi, wow!! Thank you for this. Sorry for the lateness, I was out this weekend with the family. I'll look at this soon!

DzigaV commented 8 months ago

How have your instances been this last week, @inahga?

I'll sleep better once this is merged 🙂

inahga commented 8 months ago

How have your instances been this last week, @inahga?

I'll let you know by Monday, I was slow to deploy it this week.

inahga commented 8 months ago

@DzigaV We've run this in production for a week now, with no signs of leakage. Feeling pretty confident this fixed it.

mholt commented 8 months ago

That's great news!! Thank you for the contribution and the work required to get it figured out. Glad it's working :+1: