Closed DanySK closed 2 years ago
Disagree. Renovate should not just continue when it gets 502s from GitHub. Doing so would mean noisy flapping of PRs any time GitHub has problems.
Uhmm... why should the PRs flap? In case of 502, ignoring the update (not closing existing nor opening new PRs) should work, shouldn't it?
Do you have clue on the reason why that project is getting a 502? Is this something common? It appears to be reproducible (at least, I retried several times with a few hours between the attempts).
If there are external host errors then we'll have the wrong state so should not update or close any PRs during that run. Therefore we choose to abort.
Run 1: works, creates PR Run 2: 502, autoclose a PR Run 3: works again, recreates or reopens PR
I see. My question is: why autoclose on 502? Can't renovate just leave that open? (maybe adding a comment that it errored)
Because there is not a simple 1:1 relationship between lookups and PRs (due to grouping etc). It's really more complicated than you think
I still do not get why all other updates (from other managers) should get discarded, is it for the grouping, too? I'm in this strange situation by which updates on Actions fail with 502 (and I cannot understand why, but it reproducibly fails), and I get my (completely unrelated) gradle updates blocked as well.
Can dependencies get grouped across managers? And if not, why not let non-failing managers complete?
Also, I had to fetch the logs manually: if renovate fails, it might alert the user somehow. At this moment, there is a silent failure, all updates stop, and no notification is provided. If this is intended behaviour, I believe it should get at least documented, so that users can set up some reminder to go check the dashboard every some time to ensure the bot is running.
Moreover, I still cannot understand why I keep getting 502s for updates on that repository only. Can this behaviour get reported to GitHub, in your opinion?
A lookup results in one or more updated versions. Updated versions are placed into buckets, and can produce one or more updates. Updates are later grouped together according to possibly very complex rules into branches/PRs. Therefore if a lookup fails, we can't know which of those branches/PRs would have been "touched", and cannot safely update or autoclose any of them.
Essentially, if Renovate had less features and was less configurable, there'd be a simple link guaranteed between lookups and PRs, and what you suggest might be possible. But with Renovate's configurability, it's not practical.
Can dependencies get grouped across managers? And if not, why not let non-failing managers complete?
Yes, and it's common
Also, I had to fetch the logs manually: if renovate fails, it might alert the user somehow. At this moment, there is a silent failure, all updates stop, and no notification is provided. If this is intended behaviour, I believe it should get at least documented, so that users can set up some reminder to go check the dashboard every some time to ensure the bot is running.
I would be happy if you can work out a way to flag this into the Dependency Dashboard issue. It would still cause a little "noise", but at least not so much as if we open or close PRs repeatedly during error conditions.
Moreover, I still cannot understand why I keep getting 502s for updates on that repository only. Can this behaviour get reported to GitHub, in your opinion?
Can you also reproduce it in a minimal reproduction repo? Can you reproduce it using e.g. curl
?
It does sound like a GitHub bug, but also like something they may not be interested fixing. Essentially it's a known problem with their GraphQL that if a query takes too long to run, you can get 5xx errors.
Right away I'd suggest ignoring whatever dependency causes it. Then once you can reproduce it in a minimal repo, @zharinov can look into whether we can catch/retry this with a lower page count to hopefully avoid the 5xx.
Thank you so much for the explanation, it is clear now. I would gladly try to reproduce it, I already opened a ticket GitHub-side, Is there any way I can understand which query is renovate performing from the logs?
I got a response from GitHub:
I took a look at the logs using the request-id in your report and as the message indicated -- you hit a timeout running the query. All API requests, both for the REST API and GraphQL API, have a 10-second limit on execution time. If that limit is reached for a request, the request is terminated and you get back that error. This normally happens when the query involves too much data, so the way to avoid timeouts is to write smaller queries. If the query touches a lot of data -- split it into several smaller queries and execute them separately.
Now... is there any way I can configure Renovate to run "several smaller queries"? Or any workaround I might apply?
Seeing as this seems fairly isolated, I think we need to reduce the page size and retry when it happens. A reproduction would be really helpful for this, then @zharinov can advise as he wrote the same for other GraphQL queries.
It's likely the query here which throws: https://github.com/renovatebot/renovate/blob/main/lib/modules/datasource/github-tags/cache.ts
The logic is mostly from here, including fetching 100 at a time: https://github.com/renovatebot/renovate/blob/main/lib/modules/datasource/github-releases/cache/cache-base.ts
Hi there,
Get your issue fixed faster by creating a minimal reproduction. This means a repository dedicated to reproducing this issue with the minimal dependencies and config possible.
Before we start working on your issue we need to know exactly what's causing the current behavior. A minimal reproduction helps us with this.
To get started, please read our guide on creating a minimal reproduction.
We may close the issue if you, or someone else, haven't created a minimal reproduction within two weeks. If you need more time, or are stuck, please ask for help or more time in a comment.
Good luck,
The Renovate team
@rarkins @zharinov minimal reproduction, here it is https://github.com/DanySK/minimal-reproduction-renovate-16343
The fix we need is:
I changed the default from 100 to 50 locally and re-ran, and the lookup succeeds, so one catch/retry should be enough.
:tada: This issue has been resolved in version 32.103.1 :tada:
The release is available on:
32.103.1
Your semantic-release bot :package::rocket:
How are you running Renovate?
Mend Renovate hosted app on github.com
If you're self-hosting Renovate, tell us what version of Renovate you run.
No response
Please select which platform you are using if self-hosting.
No response
If you're self-hosting Renovate, tell us what version of the platform you run.
No response
Was this something which used to work for you, and then stopped?
It used to work, and then stopped
Describe the bug
For some reason that I do not understand, renovate began failing when fetching a repo update to AlchemistSimulator/Alchemist. Apparently, GitHub is replying with a Bad Gateway, and this error blocks every update, not just the ones from the simulator.
I think that:
The issue is happening at this repository: https://github.com/DanySK/publish-on-central
Relevant debug logs
Logs
``` DEBUG: GitHub failure: 5xx { "err": { "name": "HTTPError", "code": "ERR_NON_2XX_3XX_RESPONSE", "timings": { "start": 1656601652604, "socket": 1656601653125, "lookup": 1656601653607, "connect": 1656601653914, "secureConnect": 1656601654103, "upload": 1656601654119, "response": 1656601664627, "end": 1656601664627, "phases": { "wait": 521, "dns": 482, "tcp": 307, "tls": 189, "request": 16, "firstByte": 10508, "download": 0, "total": 12023 } }, "message": "Response code 502 (Bad Gateway)", "stack": "HTTPError: Response code 502 (Bad Gateway)\n at Request.Have you created a minimal reproduction repository?
No reproduction, but I have linked to a public repo where it occurs