jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.54k stars 388 forks source link

CI tests frequently hit a GitHub API rate limit #1608

Closed manics closed 1 year ago

manics commented 1 year ago

Bug description

binderhub/tests/test_build.py often fails with the error

 >       assert launch_events > 0
E       assert 0 > 0

binderhub/tests/test_build.py:87: AssertionError
----------------------------- Captured stdout call -----------------------------
failed: Error resolving ref for gh:binderhub-ci-repos/cached-minimal-dockerfile/HEAD: GitHub rate limit exceeded. Try again in 15 minutes.
_ test_build[gh/binderhub-ci-repos/cached-minimal-dockerfile/596b52f10efb0c9befc0c4ae850cc5175297d71c] _

I think this is related to how often PRs are opned and updated, the required timeout can easily reach 60 minutes

Expected behaviour

GitHub API limits are only occasionally reached for PR workflows

Actual behaviour

Limits are often hit, causing CI tests to incorrectly fail.

How to reproduce

I think opening a PR that runs all tests, and updating if several times, is enough.

I think we should go through the tests and minimise the number of live GH API calls. Note we shouldn't use a GitHub API token since these are PRs.

betatim commented 1 year ago

Is there no way to get a GH token that is limited in scope to "read public repos" to increase the rate limit? I'm thinking a token that doesn't give the PR any new permissions except for raising the rate limit.

consideRatio commented 1 year ago

Maybe we can use secrets.github_token? I think that is around and provides read only access to stuff even when run in PRs. I'm not sure though, and I'm not sure it would work even if it is etc.

consideRatio commented 1 year ago

Maybe the tests are less frequently failing now, but I've seen that they can still fail sadly.

minrk commented 1 year ago

We should be able to record and re-use mock responses to avoid so many API calls, right? I think we do some of that already, but perhaps we've added more API calls without adding the recorded responses.