Functionality to adjust exponential backoff associated with max_retries option

leykun10 commented 8 months ago

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

[X] This is a feature request for the Python library

Describe the feature or improvement you're requesting

Functionality to adjust exponential backoff associated with max_retries option. Looking in the documentation it's stated that certain errors are automatically retried 2 times by default, with a short exponential backoff, if it's possible it's ideal to control the value of the exponential backoff either manually or should be dynamically associated with rate limited duration in case of a 429 error since just short exponential backoffs are not helpfull in this case!

Additional context

No response

rattrayalex commented 8 months ago

If there is a response header retry-after: 5, for example, this library will wait 5 seconds before retry. Note there's also retry-after-ms: 4321 which would wait 4.321 seconds.

In case of a 429, OpenAI's API currently does this (for at least some kinds of 429s - please share if you encounter a request which doesn't) so it will indeed wait a usefully long period of time.

It is currently possible to configure the number of retries with max_retries, as documented in the readme.

I agree it would be nice to configure the base exponential wait, so I'll leave this ticket open, but in transparency I don't anticipate getting to it soon.

leykun10 commented 8 months ago

hey @rattrayalex Thanks for the response! Correct if i am wrong but I have tried to look through the code and i have not found any kind of specific exponential back off calculation for 429 errors here and here. Looks like the maximum exponential back off will go is 60 seconds defined as a magic number bellow or else it's between INITIAL_RETRY_DELAY and MAX_RETRY_DELAY of values 0.8 and 8

def _calculate_retry_timeout(
        self,
        remaining_retries: int,
        options: FinalRequestOptions,
        response_headers: Optional[httpx.Headers] = None,
    ) -> float:
        max_retries = options.get_max_retries(self.max_retries)

        # If the API asks us to wait a certain amount of time (and it's a reasonable amount), just do what it says.
        retry_after = self._parse_retry_after_header(response_headers)
        if retry_after is not None and 0 < retry_after <= 60:
            return retry_after

        nb_retries = max_retries - remaining_retries

        # Apply exponential backoff, but not more than the max.
        sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)

        # Apply some jitter, plus-or-minus half a second.
        jitter = 1 - 0.25 * random()
        timeout = sleep_seconds * jitter
        return timeout if timeout >= 0 else 0

rattrayalex commented 8 months ago

It's this code:

max_retries = options.get_max_retries(self.max_retries)

# If the API asks us to wait a certain amount of time (and it's a reasonable amount), just do what it says.
retry_after = self._parse_retry_after_header(response_headers)
if retry_after is not None and 0 < retry_after <= 60:
    return retry_after

When the API responds with a 429, it sends Retry-After and Retry-After-MS headers.

leykun10 commented 8 months ago

Got it! I was referring to the use of rate limit headers but i assume retry-after is more or less based on the bellow headers and others.

x-ratelimit-reset-requests    | The time until the rate limit (based on requests) resets to its initial state
x-ratelimit-reset-tokens       |    The time until the rate limit (based on tokens) resets to its initial state.

Thanks!

openai / openai-python