tomasbasham / ratelimit

API Rate Limit Decorator
MIT License
752 stars 158 forks source link

Risk of violating the rate limit #31

Open sio opened 5 years ago

sio commented 5 years ago

From what I've gathered there exists a risk of violating the rate limit when using this module.

As of now RateLimitDecorator does not calculate a sliding time frame to track number of API calls within any given period of specified length. Instead we track some time periods of correct length, placed back-to-back almost arbitrarily (depending on the time of first API call).

Here is how it can bite us.

Let's say we must not make more than 5 calls within a minute. We make a call, then wait almost a minute, and make 9 more API calls consecutively. There are only 5 calls within the first minute and 5 calls within the second. RateLimitDecorator will allow such behavior while the API provider will definitely consider it to be a violation.

          |  time frame #1  |  time frame #2  |
API hits: -x---------x-x-x-x-x-x-x-x-x---------
                    |   VIOLATION!!!  |

Here is the code that demonstrates described violation. As in the picture above we have the limit of 5 calls per time frame (10 seconds for faster demo). But we make 9 calls within the last two seconds. API provider would not be happy with us :)

from ratelimit import limits
import time

period = 10
calls = 5
count = 0

@limits(calls=calls, period=period)
def demo():
    global count
    count += 1
    print(count, time.time())

demo()
time.sleep(period - 1)
for x in range(calls - 1):
    demo()
time.sleep(1)
for x in range(calls):
    demo()

To avoid this we need to track the time of each API call and adjust the decision making logic accordingly.

For my personal use I've written an alternative class altogether, but I do not plan on uploading it to PyPI and fragmenting the user base. You have done a great job of maintaining this package for a long time and I think it should remain the go-to place for rate limiter logic. Feel free to reuse my code though.

stuaxo commented 4 years ago

@sio would it be worth submitting your sliding window work as a PR?

sio commented 4 years ago

@stuaxo, in my implementation I made some design decisions differently, so adding sliding window logic to this project would require more work than just copy-paste.

If someone is willing to do it, I'm all in favor. Though my explicit approval is not necessary - I've already published the code under Apache License that allows it.

abhi-jha commented 4 years ago

Did this functionality get merged in the release?

sio commented 4 years ago

The most recent commit to master is older than this thread. That's definitely a no :-)

deckar01 commented 4 years ago

Another factor that can contribute to violating a rate limit is losing the state of RateLimitDecorator between application restarts. During development this happens to me often, and there is currently no way to safely resume without waiting the full time limit after that last request.

test.py ```py from datetime import datetime from functools import wraps from ratelimit import limits, sleep_and_retry, RateLimitException def log_and_pass(func): @wraps(func) def wrapper(*args, **kargs): try: return func(*args, **kargs) except RateLimitException as exception: print('{}\tmiss\t0'.format(datetime.now())) print('{}\twait\t{}'.format(datetime.now(), exception.period_remaining)) raise exception return wrapper @sleep_and_retry @log_and_pass @limits(calls=1, period=1) @limits(calls=2, period=10) def test(): print('{}\tcall\t0'.format(datetime.now())) test() test() test() ```
python test.py > log1.txt && python test.py > log2.txt
Screen Shot 2020-03-22 at 4 31 38 AM

Running parallel processes illustrates this a little more clearly:

python test.py > log1.txt & python test.py > log2.txt &
Screen Shot 2020-03-22 at 4 29 42 AM

If the call log was implemented as an sqlite database, the operations would be relatively simple queries and the state would not only be safe across restarts, but also be safe for parallel processes.

https://github.com/tomasbasham/ratelimit/compare/master...deckar01:31-db-log

test.py ```py from datetime import datetime from functools import wraps from ratelimit import limits, sleep_and_retry, RateLimitException def log_and_pass(func): @wraps(func) def wrapper(*args, **kargs): try: return func(*args, **kargs) except RateLimitException as exception: print('{}\tmiss\t0'.format(datetime.now())) print('{}\twait\t{}'.format(datetime.now(), exception.period_remaining)) raise exception return wrapper @sleep_and_retry @log_and_pass @limits(calls=1, period=1, storage='test.db', name='limit_1') @limits(calls=2, period=10, storage='test.db', name='limit_10') def test(): print('{}\tcall\t0'.format(datetime.now())) test() test() test() ```
python test.py > log1.txt & python test.py > log2.txt &
Screen Shot 2020-03-22 at 4 12 44 AM
stuaxo commented 4 years ago

I wish I had time to play at fixing this but I know the tickets at work mean it wont happen.

Having a callback to store state in the db (or ideally in Redis) would sort this.

deckar01 commented 4 years ago

No worries. I went ahead and published my fork to make my proposal easier to test drive. If there is anything I can do to help resolve this issue, just let me know.

https://pypi.org/project/deckar01-ratelimit/

lmmsoft commented 6 months ago

+1, use slidingwindow is the right solution.

Sqlite is not suit for multi-processing, use redis zadd and zremrangebyscore can be better