praw-dev / praw

PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.
http://praw.readthedocs.io/
BSD 2-Clause "Simplified" License
3.4k stars 453 forks source link

Fetch Rate Limit #2023

Open YuviGold opened 3 days ago

YuviGold commented 3 days ago

Describe the Bug

When running a lot of operations including fetch, a rate limit exception would be raised.

Desired Result

Following

there is an internal rate limit handling (docs) - but only for POST operations. The same should

Code to reproduce the bug

def test_reddit_rate_limit():
    reddit_api = Reddit(
        client_id=REDDIT_CLIENT_ID,
        client_secret=REDDIT_CLIENT_SECRET,
        user_agent=DEFAULT_USER_AGENT,
    )

    posts = list(reddit_api.subreddit("all").search("python", limit=1))
    post = posts[0]

    def fetch_post(post, index):
        print(index)
        post._fetched = False
        post._fetch()

    with ThreadPoolExecutor() as executor:
        futures = [executor.submit(fetch_post, post, index) for index in range(1000)]
        for future in as_completed(futures):
            future.result()

My code does not include sensitive credentials

Relevant Logs

def test_reddit_rate_limit():
        reddit_api = Reddit(
            client_id=REDDIT_CLIENT_ID,
            client_secret=REDDIT_CLIENT_SECRET,
            user_agent=DEFAULT_USER_AGENT,
        )

        posts = list(reddit_api.subreddit("all").search("python", limit=1))
        post = posts[0]

        def fetch_post(post, index):
            print(index)
            post._fetched = False
            post._fetch()

        with ThreadPoolExecutor() as executor:
            futures = [executor.submit(fetch_post, post, index) for index in range(1000)]
            for future in as_completed(futures):
>               future.result()

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../.pyenv/versions/3.11.4/lib/python3.11/concurrent/futures/_base.py:449: in result
    return self.__get_result()
../../../.pyenv/versions/3.11.4/lib/python3.11/concurrent/futures/_base.py:401: in __get_result
    raise self._exception
../../../.pyenv/versions/3.11.4/lib/python3.11/concurrent/futures/thread.py:58: in run
    result = self.fn(*self.args, **self.kwargs)
signals/tests/sources/reddit/test_reddit_logic.py:53: in fetch_post
    post._fetch()
../../../Library/Caches/pypoetry/virtualenvs/signals-_q1xr8cp-py3.11/lib/python3.11/site-packages/praw/models/reddit/submission.py:712: in _fetch
    data = self._fetch_data()
../../../Library/Caches/pypoetry/virtualenvs/signals-_q1xr8cp-py3.11/lib/python3.11/site-packages/praw/models/reddit/submission.py:731: in _fetch_data
    return self._reddit.request(method="GET", params=params, path=path)
../../../Library/Caches/pypoetry/virtualenvs/signals-_q1xr8cp-py3.11/lib/python3.11/site-packages/praw/util/deprecate_args.py:43: in wrapped
    return func(**dict(zip(_old_args, args)), **kwargs)
../../../Library/Caches/pypoetry/virtualenvs/signals-_q1xr8cp-py3.11/lib/python3.11/site-packages/praw/reddit.py:941: in request
    return self._core.request(
../../../Library/Caches/pypoetry/virtualenvs/signals-_q1xr8cp-py3.11/lib/python3.11/site-packages/prawcore/sessions.py:328: in request
    return self._request_with_retries(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <prawcore.sessions.Session object at 0x16cd710d0>, data = None, files = None, json = None, method = 'GET', params = {'limit': 2048, 'raw_json': 1, 'sort': 'confidence'}, timeout = 16.0, url = 'https://oauth.reddit.com/comments/13rqbgz/'
retry_strategy_state = <prawcore.sessions.FiniteRetryStrategy object at 0x30db36110>

    def _request_with_retries(
        self,
        data: list[tuple[str, Any]],
        files: dict[str, BufferedReader],
        json: dict[str, Any],
        method: str,
        params: dict[str, int],
        timeout: float,
        url: str,
        retry_strategy_state: FiniteRetryStrategy | None = None,
    ) -> dict[str, Any] | str | None:
        if retry_strategy_state is None:
            retry_strategy_state = self._retry_strategy_class()

        retry_strategy_state.sleep()
        self._log_request(data, method, params, url)
        response, saved_exception = self._make_request(
            data,
            files,
            json,
            method,
            params,
            retry_strategy_state,
            timeout,
            url,
        )

        do_retry = False
        if response is not None and response.status_code == codes["unauthorized"]:
            self._authorizer._clear_access_token()
            if hasattr(self._authorizer, "refresh"):
                do_retry = True

        if retry_strategy_state.should_retry_on_failure() and (
            do_retry or response is None or response.status_code in self.RETRY_STATUSES
        ):
            return self._do_retry(
                data,
                files,
                json,
                method,
                params,
                response,
                retry_strategy_state,
                saved_exception,
                timeout,
                url,
            )
        if response.status_code in self.STATUS_EXCEPTIONS:
>           raise self.STATUS_EXCEPTIONS[response.status_code](response)
E           prawcore.exceptions.TooManyRequests: received 429 HTTP response

This code has previously worked as intended

No

Operating System/Environment

macOS

Python Version

3.11.6

PRAW Version

7.7.1

Links, references, and/or additional comments?

No response

LilSpazJoekp commented 3 days ago

The references you linked are for operations that have additional rate limits. It has nothing to do with general rate limiting that is dictated by the rate limit headers that Reddit returns.

The issue here is you're operating PRAW with a thread pool executor and PRAW is not thread safe. This is causing all the requests to fire off all at once instead of respecting the rate limit. You're also fetching a post that is already fetched from a listing and as a result, you're consuming all of your API requests.

YuviGold commented 2 days ago

The references you linked are for operations that have additional rate limits. It has nothing to do with general rate limiting that is dictated by the rate limit headers that Reddit returns.

The issue here is you're operating PRAW with a thread pool executor and PRAW is not thread safe. This is causing all the requests to fire off all at once instead of respecting the rate limit. You're also fetching a post that is already fetched from a listing and as a result, you're consuming all of your API requests.

Thanks @LilSpazJoekp

The example above was only to demonstrate an explicit scenario of many requests. But yes - in my use-case I have multiple processes trying to parse different posts/comments from Reddit that under-the-hood call fetch and I hit prawcore.exceptions.TooManyRequests: received 429 HTTP response.

I noticed the confusion of the docs was already brought up in https://github.com/praw-dev/praw/issues/1777. Maybe what's missing for me - if there are ratelimit cases uncovered by PRAW - what can be done to handle them? Could you help me understand how should it be treated? How long does it take for my API requests consumption to refresh?

LilSpazJoekp commented 2 days ago

if there are ratelimit cases uncovered by PRAW - what can be done to handle them?

The cases that get uncovered are the ones referenced in the docs you linked. You'll just need to specify how long you want PRAW to wait when it hits one of those rate limits with the ratelimit_seconds configuration parameter. In you receive a 429 error than that could mean you're running too many instances of PRAW.

How long does it take for my API requests consumption to refresh?

10 minutes.