web-platform-tests / wpt.live

A live version of the web-platform-tests project
https://wpt.live/
15 stars 11 forks source link

pr-preview-sync is regularly hitting GitHub API limits and failing #26

Closed stephenmcgruer closed 4 years ago

stephenmcgruer commented 4 years ago

About once a day, see https://github.com/web-platform-tests/wpt/actions?query=workflow%3Apr-preview-sync

INFO:__main__:Response status code: 200
INFO:__main__:Limit for "core" resource: 969/5000
Traceback (most recent call last):
  File "./tools/ci/pr_preview.py", line 430, in <module>
    values.pop('func')(**values)
  File "./tools/ci/pr_preview.py", line 316, in synchronize
    if should_be_mirrored(project, pull_request):
  File "./tools/ci/pr_preview.py", line 277, in should_be_mirrored
    not project.pull_request_is_from_fork(pull_request)
  File "./tools/ci/pr_preview.py", line 94, in wrapped
    'Exiting to avoid GitHub.com API request throttling.'
Exception: Exiting to avoid GitHub.com API request throttling.

So we need to determine why we're hitting resource limits, and maybe reduce frequency of pr-preview-sync or use a different bot account (I think its using wpt-pr-bot).

stephenmcgruer commented 4 years ago

So from a recent failure, we see runs of pr-preview-sync, along with their 'core' resources left:

Feb 11th, 06:42 GMT-5 - 1975/5000, success
Feb 11th, 06:51 GMT-5 - 1337/5000, success
Feb 11th, 06:58 GMT-5 - 93/5000, fail
Feb 11th, 07:09 GMT-5 - no PRs
Feb 11th, 07:17 GMT-5 - no PRs
Feb 11th, 07:25 GMT-5 - no PRs
Feb 11th, 07:34 GMT-5 - 3965/5000, success

At this point I strongly suspect that https://github.com/web-platform-tests/wpt-pr-bot is suddenly eating 100s to 1000s of API capacity, likely when it tries to do a WebKit sync. I'm going to try to correlate logs, though sadly wpt-pr-bot doesn't log API limits as it works currently.

stephenmcgruer commented 4 years ago

Yep, wpt-pr-bot kicks off a WebKit sync around 06:53, and looking at the logs we do a huge amount of API calls to GitHub during this. In fact at 7:03 we even see:

2020-02-11T12:03:43.060727Z GET https://api.github.com/repos/web-platform-tests/wpt/issues/15431 
2020-02-11T12:03:43.107724Z Error: API rate limit exceeded for user ID 16780864.
    at errFrom (/srv/lib/github.js:10:15)
    at Request.onResponse [as _callback] (/srv/lib/github.js:87:29)
    at Request.self.callback (/srv/node_modules/request/request.js:185:22)
    at Request.emit (events.js:210:5)
    at Request.<anonymous> (/srv/node_modules/request/request.js:1161:10)
    at Request.emit (events.js:210:5)
    at IncomingMessage.<anonymous> (/srv/node_modules/request/request.js:1083:12)
    at Object.onceWrapper (events.js:299:28)
    at IncomingMessage.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12) { 
2020-02-11T12:03:43.107745Z   statusCode: 403 
2020-02-11T12:03:43.107755Z } 

Currently the WebKit sync runs once every 5 minutes, so ~12 times an hour. As the quickest fix I can think of, I'm going to drop that to once every 15 minutes today. This will increase latency but currently there are very few (possibly no) active WebKit PRs so I think that's well worth paying to stop the API limit issue.

stephenmcgruer commented 4 years ago

Dropping the sync seems to have done it (though its hard to prove a negative result). I'm checking pr-preview-sync near the end of each hour, and the lowest I'm seeing it at is ~4000/5000 API calls left - a massive improvement over <1000 !