vutran1710 / PyrateLimiter

⚔️Python Rate-Limiter using Leaky-Bucket Algorithm Family
https://pyratelimiter.readthedocs.io
MIT License
349 stars 36 forks source link

SQLite locks #67

Closed raratiru closed 2 years ago

raratiru commented 2 years ago

When calling each one of the following functions from different REPLs, an error occurs OperationalError: database is locked. Once, it worked (but did not last after the delay) and the total requests before the first delay for both REPLs were 4075 (+75).

limiter = Limiter(
    RequestRate(4000, Duration.MINUTE),
    bucket_class=SQLiteBucket,
    bucket_kwargs={'path': '/tmp/pyrate_limiter.sqlite'},
)

@limiter.ratelimit("test", delay=True)
def api_limit():
    pass

def func1():
    # Imported and called from REPL1
    count = 1
    while True:
        api_limit()
        print(count)
        count += 1

def func2():
    # Imported and called from REPL2
    count = 1
    while True:
        api_limit()
        print(count)
        count += 1

Is it relevant to #61 ?

JWCook commented 2 years ago

Yes, this should be fixed by #61.

raratiru commented 2 years ago

Version 2.6.3 fixed the sum of the requests from both terminals (in case they run simultaneously).

However the error about the locked database pops up and disrupts (or most commonly blocks) the simultaneous run of the terminals.

The error on a "Debian GNU/Linux 11 (bullseye)" box (expires on May 2022): https://pastebin.com/SHhLnSin

In case it helps, the above code works without problems in the deckar01 fork of ratelimit

JWCook commented 2 years ago

Okay, I see. #61 mainly made the SQLite bucket thread-safe. It's also now compatible with ProcessPoolExecutor, which spawns child processes from one parent process.

I just realized that dealing with multiple unrelated processes, like in this example, is a separate (and more difficult) problem.

raratiru commented 2 years ago

Thank you for the prompt answer.

In my case, I am trying to replicate the occasion where a cron job (for example) initiates a process while the previous one has not yet finished.

JWCook commented 2 years ago

That makes sense. I think a file-level lock like py-filelock will be the only way to make the SQLite backend work correctly from multiple scripts, while also supporting all the existing features of this library (mainly, tracking several different rate limits at once). I'll look into that.

It appears that in the issue you linked, that library is working around the problem by catching and ignoring OperationalError: https://github.com/deckar01/ratelimit/blob/master/ratelimit/decorators.py#L120

It's also using a different algorithm (sliding window), so it has a fewer database operations and in a different order, which may make multiprocess errors less likely to come up. I don't believe that eliminates the problem, though; with a faster request rate and/or leaving your scripts running for long enough, they will eventually make more requests than they're supposed to. Either way, the same approach wouldn't work for this library.

raratiru commented 2 years ago

Oh! The code seems to be a kind of soft hack around the issue.

As a matter of fact that fork was created to apply the sliding window algorithm and try to solve the same very issues for the apparently unmaintained parent library.

Indeed py-filelock could prove to be py-substantially helpful!

JWCook commented 2 years ago

I think I have this working in my branch for #68.

Here's a slightly modified example to test this:

import sys
from time import perf_counter as time

from pyrate_limiter import Limiter, RequestRate, Duration, FileLockSQLiteBucket

limiter = Limiter(
    RequestRate(100, Duration.SECOND),
    RequestRate(1000, Duration.MINUTE),
    bucket_class=FileLockSQLiteBucket,
)

@limiter.ratelimit("test", delay=True)
def send_request():
    """Rate-limited function representing a single request"""

def test_ratelimit(n_process=0):
    """Continuously send rate-limited requests"""
    start_time = time()
    n_requests = 1

    while True:
        send_request()
        print(f"[Process {n_process}] {time() - start_time:.2f}: {n_requests}")
        n_requests += 1

if __name__ == "__main__":
    n_process = sys.argv[1] if len(sys.argv) > 1 else 0
    test_ratelimit(n_process)

Either run from separate terminals, or as background processes:

python filelock_test.py 1 &
python filelock_test.py 2 &
python filelock_test.py 3 &
...

# Kill background processes when done:
pkill -f filelock_test.py

I ran this for several minutes with 0 extra requests. And I added an automated test that basically does the same thing, except with multiprocess.Process (but no shared Limiter object).

JWCook commented 2 years ago

@raratiru @spiffspaceman FileLockSQLiteBucket is now available in v2.7. Give that a try and let me know if that works for you.

raratiru commented 2 years ago

@JWCook Thank you, it works great!

SpiffSpaceman commented 2 years ago

yes i ran it few times, seems to be fine . Thanks