viafintech / gcra-ruby

Generic cell rate algorithm (leaky bucket) implementation for rate limiting
MIT License
51 stars 10 forks source link

fix race condition for "compare and set" #6

Closed JeanMertz closed 4 years ago

JeanMertz commented 4 years ago

I ran into an issue which caused the Lua script to return an error if the key does not exist. This issue arises in a system with a high volume of concurrent traffic for the same key, and a low TTL value.

The tat_from_store value is set at the top of the loop, which is then used at the bottom to determine if the key should be set, or updated.

If the key exists at the top, but then expires before the script reaches the bottom, the wrong path is taken.

Usually, this isn't an issue, since the script retries 10 times before giving up. However, with a high enough concurrent throughput for the same key (which is a likely scenario, given the exact purpose of this library), this situation can repeat itself 10 times, resulting in the script failing.

From what I understand, there's no reason why the Lua script can't set the value itself and behave similar to set_if_not_exists_with_ttl, if it determines the key does not exist.

An alternative approach would be to always run the Lua script, as it now does what set_if_not_exists_with_ttl does, and more, but I'm guessing (without having benchmarked it) that we still want the regular set call to stay in place, as it'll run faster than the Lua script.

JeanMertz commented 4 years ago

I deployed this change in our infrastructure but still see the same issue happening. I'm going to add some more debugging into our set-up to see what's going on.

tobischo commented 4 years ago

Thank you for reporting this issue and looking into it further. Could you share the range of your TTL values and the access rate?