ulule / limiter

Dead simple rate limit middleware for Go.
MIT License
2.05k stars 146 forks source link

Potential race between Redis key TTL and Context.Reset? #94

Open Xopherus opened 4 years ago

Xopherus commented 4 years ago

I'm seeing what seems to be a race condition where the limiter key is still present in the Redis store after the code waits for Context.Reset to be reached. This causes the next iteration of the code to also be rate-limited, but Context.Reset is already past, so the sleep duration is negative.

Been seeing this with the following:

Here's a snippet of the code which should be able to reproduce it:

for {
    limit, err := r.Limiter.Get(ctx, key)
    if err != nil {
    log.Printf("[ERROR] failed to fetch rate-limit context %s", err)
    return err
    }

    if limit.Reached {
        sleep := time.Until(time.Unix(limit.Reset, 0))

        log.Printf("[ERROR] client has proactively throttled for %s", sleep.String())
        <-time.After(sleep)
        continue
    }

    // do stuff 
}

And here are the logs:

2020/04/07 19:41:41 [DEBUG] ratelimit context: {Limit:2 Remaining:0 Reset:1586288531 Reached:true}
2020/04/07 19:41:41 [ERROR] client has proactively throttled for 29.954364124s
2020/04/07 19:42:11 [DEBUG] ratelimit context: {Limit:2 Remaining:0 Reset:1586288531 Reached:true}
2020/04/07 19:42:11 [ERROR] client has proactively throttled for -81.245359ms

Can anyone else reproduce this?

novln commented 4 years ago

Hello,

Can you give me a gist with a full example on how you initialize the store, what period do you use, what code do you execute after do stuff because you've logged something in débug and I don't see it in your snippet. And also, what topology of redis do you use (simple, sentinel, cluster?)

Thank you.

Xopherus commented 4 years ago

Here's what I could come up with @novln. I was able to reproduce with rates of 1-S and 1-M. I've also attached the code + go.mod/go.sum files.

https://gist.github.com/Xopherus/7423f7b2b60264b94bb9dd5c9ebe9c7d

novln commented 4 years ago

Ok I'm able to reproduce that, thank you for the gist.

I think the issue is how we compute the Reset value and how expiration works on redis. For a small fixes (for the time being) add one second to the Reset value. I'll try to find some time to investigate that issue further.

Cheers,

novln commented 4 years ago

Because the expiration is somewhat imprecise between the PTTL redis command, the conversion from this duration to a concrete time.Time and then, the conversion to unix timestamp, we loose some milliseconds of precision.

I won't fix this issue (for the moment) because it means a refactoring that I don't want to manage. My recommendation: If the sleep duration is below zero, change it to 500 ms to force a next tick.

With that being said, I'll gladly review pull request.

Cheers,