hub4j / github-api

Java API for GitHub
https://github-api.kohsuke.org/
MIT License
1.14k stars 727 forks source link

Implement "Secondary rate limit" behavior to internally throttle querying #1975

Open bitwiseman opened 1 day ago

bitwiseman commented 1 day ago

See #1805
See #1842

This docs page describes secondary rate limit behavior: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#about-secondary-rate-limits

As of this reading it says:

You may encounter a secondary rate limit if you:

  • Make too many concurrent requests. No more than 100 concurrent requests are allowed. This limit is shared across the REST API and GraphQL API.
  • Make too many requests to a single endpoint per minute. No more than 900 points per minute are allowed for REST API endpoints, and no more than 2,000 points per minute are allowed for the GraphQL API endpoint. For more information about points, see "Calculating points for the secondary rate limit."
  • Make too many requests per minute. No more than 90 seconds of CPU time per 60 seconds of real time is allowed. No more than 60 seconds of this CPU time may be for the GraphQL API. You can roughly estimate the CPU time by measuring the total response time for your API requests.
  • Create too much content on GitHub in a short amount of time. In general, no more than 80 content-generating requests per minute and no more than 500 content-generating requests per hour are allowed. Some endpoints have lower content creation limits. Content creation limits include actions taken on the GitHub web interface as well as via the REST API and GraphQL API.

These secondary rate limits are subject to change without notice. You may also encounter a secondary rate limit for undisclosed reasons.

These are incredibly loosely defined guides and you cannot query for them ahead of time. 👎 It looks like we need to take the path some users have suggested and make rate limiting much more resilient, potentially allowing users to write their own rate limit strategies for handling secondary rate limits.

The current internal GitHubRateLimitChecker would need to be replaced by a PrimaryGitHubRateLimiter which extends a new GitHubRateLimiter class/interface. Then each of the above bullet points would a new rate limiter class. All of them would need to be called before and after each query, and maintain their own configuration and calculated state. GitHubRateLimiter would provide the API and possibly helper functions to make that easier to do right.

I think the basic API would be that the method call before a request is sent, would return an Optional<Duration> and if more than one limiter returns a Duration the longest one is used. Or maybe return an option record that includes a reason message and a duration, perhaps also a logLevel/severity. Make it easier to produce meaningful output.

realskudd commented 1 day ago

I'm getting this error from two different locations when I attempt to search at all. Something appears to have gone sideways with the implementation.

I understand the need for ratelimiting, but this doesn't seem like it was tested thoroughly enough prior to release.