Closed stephenmcgruer closed 4 years ago
So from a recent failure, we see runs of pr-preview-sync, along with their 'core' resources left:
Feb 11th, 06:42 GMT-5 - 1975/5000, success
Feb 11th, 06:51 GMT-5 - 1337/5000, success
Feb 11th, 06:58 GMT-5 - 93/5000, fail
Feb 11th, 07:09 GMT-5 - no PRs
Feb 11th, 07:17 GMT-5 - no PRs
Feb 11th, 07:25 GMT-5 - no PRs
Feb 11th, 07:34 GMT-5 - 3965/5000, success
At this point I strongly suspect that https://github.com/web-platform-tests/wpt-pr-bot is suddenly eating 100s to 1000s of API capacity, likely when it tries to do a WebKit sync. I'm going to try to correlate logs, though sadly wpt-pr-bot doesn't log API limits as it works currently.
Yep, wpt-pr-bot kicks off a WebKit sync around 06:53, and looking at the logs we do a huge amount of API calls to GitHub during this. In fact at 7:03 we even see:
2020-02-11T12:03:43.060727Z GET https://api.github.com/repos/web-platform-tests/wpt/issues/15431
2020-02-11T12:03:43.107724Z Error: API rate limit exceeded for user ID 16780864.
at errFrom (/srv/lib/github.js:10:15)
at Request.onResponse [as _callback] (/srv/lib/github.js:87:29)
at Request.self.callback (/srv/node_modules/request/request.js:185:22)
at Request.emit (events.js:210:5)
at Request.<anonymous> (/srv/node_modules/request/request.js:1161:10)
at Request.emit (events.js:210:5)
at IncomingMessage.<anonymous> (/srv/node_modules/request/request.js:1083:12)
at Object.onceWrapper (events.js:299:28)
at IncomingMessage.emit (events.js:215:7)
at endReadableNT (_stream_readable.js:1183:12) {
2020-02-11T12:03:43.107745Z statusCode: 403
2020-02-11T12:03:43.107755Z }
Currently the WebKit sync runs once every 5 minutes, so ~12 times an hour. As the quickest fix I can think of, I'm going to drop that to once every 15 minutes today. This will increase latency but currently there are very few (possibly no) active WebKit PRs so I think that's well worth paying to stop the API limit issue.
Dropping the sync seems to have done it (though its hard to prove a negative result). I'm checking pr-preview-sync near the end of each hour, and the lowest I'm seeing it at is ~4000/5000 API calls left - a massive improvement over <1000 !
About once a day, see https://github.com/web-platform-tests/wpt/actions?query=workflow%3Apr-preview-sync
So we need to determine why we're hitting resource limits, and maybe reduce frequency of pr-preview-sync or use a different bot account (I think its using wpt-pr-bot).