grantjenks / python-diskcache

Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
http://www.grantjenks.com/docs/diskcache/
Other
2.35k stars 132 forks source link

Add "throttle" Method for Rate Limiting #35

Closed grantjenks closed 5 years ago

grantjenks commented 7 years ago

Pseudocode:

def rate_limit(key, count, seconds, memo={}):
    """Rate limit resource identified by key to count per seconds.

    >>> while True:
    ...     wait = next(rate_limit('some key', 5, 8))
    ...     if not wait:
    ...         break
    ...     else:
    ...         time.sleep(wait)
    ... do_rate_limited_work()

    """
    if (key, count, seconds) in memo:
        return memo[key, count, seconds]

    assert isinstance(count, int) and count > 0

    def meter():
        rate = float(count) / seconds
        last = time.time()
        tally = count

        while True:
            now = time.time()
            extent = now - last
            last = now
            tally += extent * rate

            if tally > count:
                tally = count
            elif tally < 1:
                yield (1 - tally) / rate
            else:
                tally -= 1
                yield 0

    limiter = meter()

    memo[key, count, seconds] = limiter
    return limiter

Need to track last and tally in Cache. Might be better to build as a separate object. The generator pattern is not that useful. But the algorithm is tested and correct.

grantjenks commented 7 years ago

"throttle" seems to be a good one-word name. Other considered: "meter", "pace", "tempo", "rhythm". Some googling shows that throttling requests is more common than other terms.

grantjenks commented 7 years ago

See also the stampede lock code in the diskcache directory.

grantjenks commented 5 years ago

Tested pseudocode:

def throttle(count, seconds):
    """Throttle function calls to `count` times per `seconds`.

    """
    rate = count / seconds

    def decorator(func):
        last = time.time()
        tally = count

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            nonlocal last, tally
            while True:
                now = time.time()
                extent = now - last
                last = now
                tally += extent * rate
                if tally > count:
                    tally = count - 1
                    break
                elif tally < 1:
                    time.sleep((1 - tally) / rate)
                else:
                    tally -= 1
                    break
            func(*args, **kwargs)
        return wrapper
    return decorator

I'm doubtful this can be made multi-threaded with the atomic methods provided by diskcache. But maybe it can be good-enough.

grantjenks commented 5 years ago

Design notes: throttle should be method on Cache objects that accepts key, count and seconds. The throttle method should use a transaction to calculate how long to sleep before asking again. If returned value is zero then perform action immediately.

Maybe there should be a Throttler object that uses the throttle method instead.

FanoutCache should also have throttle method but acts instead as decorator and handles retries for Timeout error. Django Cache ha same method that behaves identically.

Maybe have rate-limit method and throttle. One acts as low-level sleeper function, other acts as decorator. Yes, better to have separate methods.

grantjenks commented 5 years ago

It would be nice to have an API for measuring the rate of something. Use case idea: better health checks. I think it could be implemented as a leaky-bucket with an un-bounded size. Return value is bucket size. Measure in calls per second.

grantjenks commented 5 years ago

This is now included in the recipes. Commit reference bf7196d