pycontribs / jira

Python Jira library. Development chat available on https://matrix.to/#/#pycontribs:matrix.org
https://jira.readthedocs.io
BSD 2-Clause "Simplified" License
1.92k stars 857 forks source link

Regression of 429 response handling in 3.6.0 #1805

Closed matthias-bach-by closed 3 months ago

matthias-bach-by commented 5 months ago

Bug summary

While the improved handling of the retry-after header that got introduced in 3.6.0 via #1713 is an important improvement, it sadly introduced two regressions in the behaviour when handling 429 responses.

  1. While the retry-after value is finally evaluated and not just printed, it is sadly used as an upper bound for the backoff time. Thus, we are almost guaranteed to run into a second 429 response when handling the first one.
  2. The code assumes requests that have a retry-after value of 0 must not be retried. However, Jira seems to frequently provide a retry-after value of 0 on 429 responses.

With the previous plain backoff you had a good chance of backing off enough to happen to evade the rate limiting as long as you weren't running parallel requests. With the new behaviour we are a lot more aggressive and even with a single client easily run into the case of not even retrying.

Is there an existing issue for this?

Jira Instance type

Jira Server or Data Center (Self-hosted)

Jira instance version

9.12.2

jira-python version

3.6.0

Python Interpreter version

3.12

Which operating systems have you used?

Reproduction steps

# 1. Given a Jira client instance
jira: JIRA
# 2. Running a random cheap operation often enough
for _ in range(100):
    jira.issue('SOME-1')

Stack trace

E           jira.exceptions.JIRAError: JiraError HTTP 429 url: https://jira.example.org/rest/api/2/search?jql=issuekey+in+%28SOME-1&startAt=0&validateQuery=True&fields=issuetype&fields=resolution&fields=status&maxResults=100
E               text: Rate limit exceeded.
E
E               response headers = {'Content-Type': 'application/json;charset=UTF-8', …}
E               response text = {"message":"Rate limit exceeded."}

Expected behaviour

I'd expect the value of the retry-after header to be interpreted as a lower bound for the backup. I.e., doing something along the following:

delay = suggested_delay + self.max_retry_delay * random.random()

Furthermore, we should also retry requests that have a suggested_delay of 0. If that would indicate an error for 503 requests, we might need to make the decision depend on the return code, too.

Additional Context

No response