Recheck pr-for-updates workflow

glazychev-art commented 1 year ago

Description

Some repositories very often get the error - 403: Forbidden, You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later. https://github.com/networkservicemesh/deployments-k8s/actions/runs/5014568818/jobs/8988977238 https://github.com/networkservicemesh/sdk-sriov/actions/runs/5014807396/jobs/8989537883

Most likely, the github token is used incorrectly

d-uzlov commented 1 year ago

Github documentation states that there is a global rate limit, which is fairly high: https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#rate-limits-for-requests-from-github-actions

When using GITHUB_TOKEN, the rate limit is 1,000 requests per hour per repository. For requests to resources that belong to an enterprise account on GitHub.com, GitHub Enterprise Cloud's rate limit applies, and the limit is 15,000 requests per hour per repository.

And also there are many unspecified secondary rate limits for certain actions: https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#secondary-rate-limits

And according to logs, we are hitting some unknown limit for PR creation.

Unable to create pull request: 403: Forbidden, You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later.

Here is the github documentation about dealing with secondary rate limits: https://docs.github.com/en/rest/guides/best-practices-for-integrators?apiVersion=2022-11-28#dealing-with-secondary-rate-limits

I think the reason we hit the limit is that we update our repositories in parallel:

Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.

If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.

The way to handle rate limit errors is just to retry:

If the Retry-After response header is present, retry your request after the time specified in the header. The value of the Retry-After header will always be an integer, representing the number of seconds you should wait before making requests again. For example, Retry-After: 30 means you should wait 30 seconds before sending more requests.

If the x-ratelimit-remaining header is 0, retry your request after the time specified by the x-ratelimit-reset header. The x-ratelimit-reset header will always be an integer representing the time at which the current rate limit window resets in UTC epoch seconds.

Otherwise, wait for an exponentially increasing amount of time between retries, and throw an error after a specific number of retries.

In the repo of the action we use to create PRs there is an issue about retrying PR creation and a branch for testing:

https://github.com/vsoch/pull-request-action/issues/95

But it doesn't use github guidelines to determine retry timings.

NikitaSkrynnik commented 1 year ago

Creating a PR is POST request in Github Rest API. Github Documentation recommends doing this type of requests with one second wait between each request. To make our workflows run sequentially with one second waiting between each two workflows we need:

Create a Large Github Runner
Configure autoscaling to make the runner run only one workflow at a time. Maximum Job Concurrency should be set to 1.
Add one second sleep at the end of pr-for-updates workflow and run it on this runner

Currently there are no functionality in Github to queue workflows using default Github runners for workflows. There are some discussions about this feature but it looks like there are no plans to it yet.

networkservicemesh / .github

Recheck pr-for-updates workflow #27

Description