renovatebot / app-support

Discussion/support issues for the hosted Renovate App
0 stars 0 forks source link

Passing status checks not detected immediately by app #39

Closed felixfbecker closed 3 years ago

felixfbecker commented 5 years ago

What Renovate type are you using? Renovate GitHub App

Describe the bug There was a new version of one of our packages released: https://github.com/sourcegraph/sourcegraph-extension-api/releases/tag/v18.0.2

image

Renovate did a run on a depending repo:

image

and recognized the new version. It created a branch for the update:

image

for which CI passed 10min later:

image

Now it's been roughly half an hour since then and I would expect to see a PR created (or automerging be performed if confgigured), but nothing is happening. Renovate has not done another run so far.

It's very important that when a change is merged into one repo, we can be confident that it's going to be updated in other repos "as soon as possible". Ideally it should only be bound to the CI time. I would expect no significant delays (of upward to an hour like seen here) between the steps in this workflow. I've heard confusion from coworkers about why these updates are not progressing / seemingly getting stuck and people eventually do updates manually because they don't trust the system.

Is this expected, or a bug? If it's expected, is there a way this could be improved? Could Renovate maybe listen to commit status webhooks to trigger a run that creates the PR/automerges as soon as CI passes?

rarkins commented 5 years ago

It's expected, for now. GitHub's status update webhooks (previously, at least) did not include enough information in the payload to tell you if the branch was "green" or not - it just tells you that a particular test passed. Running Renovate every time that happens therefore was really inefficient and chewed up API rate limits. Now with check runs it's even worse because there are two things to look at - status checks array and check runs array.

If it is important for you then I'll look into it again to see what's feasible. One thing I'd recommend though is consider raising PRs immediately when it's your own internal packages, instead of waiting for them to be green before getting the PR. It's quite possible that GitHub will also provide us better webhook data about PRs than it does about branch commits, too.

rarkins commented 5 years ago

Are there any repos you can point me to so that I can check the historical webhooks received?

felixfbecker commented 5 years ago

To avoid triggering a Renovate "run" on every webhook, could there be a service that handles the incoming webhooks and filters out the ones for "renovate branch passed" and only then triggers a Renovate run? Or (less preferable) could Renovate provide a "trigger" API where we can trigger a run manually with curl at the end of CI?

One thing I'd recommend though is consider raising PRs immediately

But automerging will still be delayed, right?

Are there any repos you can point me to so that I can check the historical webhooks received?

This instance was from private repo https://github.com/sourcegraph/enterprise but you can also look at https://github.com/sourcegraph/sourcegraph

rarkins commented 5 years ago

Using sourcegraph/enterprise, we got three webhooks for this branch: image

The first two were pending, the third one was success. Looking at that third one, it unfortunately is like it always was and gives only information about that one particular status check and not about the branch's overall status.

      "name":"sourcegraph/enterprise",
      "target_url":"https://yourci.com/sourcegraph/enterprise/builds/1234",
      "context":"buildkite/enterprise",
      "description":"Build renovatebot/renovate#1234 passed (10 minutes, 47 seconds)",
      "state":"success",

If it's important to you, I can probably add a webhook hack for now that always triggers sourcegraph repos whenever any success or failure status check is received. A longer term solution will need to be to retrieve query the commit's full status via API first every time, and then queue a run if it's non-pending.

rarkins commented 5 years ago

I've added this custom webhook logic, which should trigger an extra run on any sourcegraph repository for any status check that is not "pending". I'll leave this issue open until a more generic solution is available.

travi commented 4 years ago

it seems that this delay still exists. is it still something under consideration or simply the intended behavior at this point? i'm in the process of evaluating where to migrate my projects after the recent greenkeeper announcement and think renovate could be a really good fit for all of the details i've looked into so far except for this one.

i've set up many of my projects to expect that the process works as a single continuous deployment pipeline and the delay that exists would be a showstopper in those cases.

if it's helpful, this was a key detail in my implementation of greenkeeper-keeper. it didnt handle some details like pagination but wasn't terribly chatty and worked well for at least my heavy use of greenkeeper. @gr2m also has a nice writeup about a similar approach.

rarkins commented 4 years ago

FYI I added this app-support repo to separate out app-specific topics from the Renovate CLI repo, so just moved this issue accordingly.

Renovate too has a "what's the combined status of this branch?" logic but I never ported it to the webhook handling. I still find it pretty frustrating that GitHub makes it so hard to simply know "is this branch green?".

@travi thanks for the background info. There's necessarily still going to be some delay when things are event-driven instead of in the same pipeline. i.e. GitHub sends webhooks any time any status check events happens, then Renovate's webhook handler has to (unfortunately) query multiple endpoints to derive a combined status, then a job is queued up, Renovate runs, and then should merge that branch during that run (assuming no conflicts, it's still up-to-date, etc). This would hopefully be bound to around a minute or so in most cases.

FYI the key blocker for this in the past was the need to either:

I'm leaning towards the latter, naturally.

travi commented 4 years ago

I still find it pretty frustrating that GitHub makes it so hard to simply know "is this branch green?".

1000x yes, this!

There's necessarily still going to be some delay when things are event-driven instead of in the same pipeline.

Totally understood. the natural delay that results from the complexities of tying a complex pipeline like this together are pretty clear to me and accepted, assuming things are still moving along. waiting for queues makes sense, but not being queued at all yet is significantly more painful for this type of use case.

This would hopefully be bound to around a minute or so in most cases.

I think that is pretty acceptable. I'll admit that I'm used to faster since greenkeeper-keeper ran as its own instance and traffic was low enough to avoid queues since it was self-hosted. However, minutes are totally reasonable. The current behavior is simply significantly more than that.

I'm leaning towards the latter, naturally.

Seems reasonable to me. I'm more than happy to discuss anything about my process in more detail where it would be helpful, but I imagine you have your head pretty well around what is left.

The main question on my mind is timing since it does sound like this could still be a fair amount of work. Is it reasonable to think that immediate queuing of the renovate run based on status success could be completed by the planned June shutdown of Greenkeeper? Not trying to add more pressure, but that date is a pretty important one as I evaluate what my options are.

FYI I added this app-support repo to separate out app-specific topics from the Renovate CLI repo, so just moved this issue accordingly.

noted. i'll keep that in mind in the future. thanks!

rarkins commented 3 years ago

This has now been implemented on the backend. It's quite challenging:

Querying the API after any passing webhook notification would be too aggressive, so we implemented this with a 5 minute "debounce" - Renovate's backend will query the SHA's combined status 5 minutes after the last successful webhook was received. If the combined result is green, and the branch is configured for automerge, then a job will be enqueued immediately.

i.e. you should hopefully see automerging happening ~6 minutes after branches go green. We will evaluate whether a shorter debounce period can be used in future, but for now this is a big improvement over the ~30 minute average wait time before.

travi commented 3 years ago

Super excited for this. Thanks a ton!

travi commented 3 years ago

i've already had a chance to see this in action on a bunch of new PRs and it seems to be working great. a very welcome improvement. thanks again!

rarkins commented 3 years ago

Unfortunately I've needed to roll back this feature due some undesirable database spiking caused by locks. I'll comment back here again once it's re-enabled.

rarkins commented 3 years ago

I'm going to archive this repository so that we have conversations in one place. Therefore please create an "App Support" discussion here if this problem or suggestion is still relevant: https://github.com/renovatebot/renovate/discussions