smarkets / marge-bot

A merge-bot for GitLab
BSD 3-Clause "New" or "Revised" License
701 stars 136 forks source link

"branch cannot be merged" on freedesktop.org #263

Closed anholt closed 4 years ago

anholt commented 4 years ago

Since a gitlab update, we've been seeing "branch cannot be merged" intermittently from marge. One of our gitlab admins looked into it and concluded:

Yeah, as suspected the culprit is Marge not doing the right thing:

Note: Starting in GitLab 12.8, the mergeability (merge_status) of a merge request will be checked asynchronously when a request is made to this endpoint. Poll this API endpoint to get updated status. This affects the has_conflicts property as it is dependent on the merge_status. It’ll return false unless merge_status is cannot_be_merged.

Completing a pipeline will cause a MR to become mergeable, but not instantaneously. Marge assumes that the instant a pipeline reports complete, the MR will be mergeable. That's not actually true, since it still has to check for things like whether or not the head is a fast-forward, etc. This is done in a background job via Sidekiq. The fix seems pretty straightforward: in Marge's job handling, after pipeline completion but before merging, it needs to poll on merge_status and watch it go green. I looked at it briefly this morning, but it involved NixOS and that pushed it over the time budget I have right now. If someone wants to pick up updating Marge then I'm happy to get it deployed.

fooishbar commented 4 years ago

The missing link was to the GitLab API docs, specifically the get single MR endpoint where the quote comes from.

snim2 commented 4 years ago

I've also been seeing this on gitlab.com since 12.8. Reassigning Marge usually works, which would make sense given this explanation.

snim2 commented 4 years ago

Any thoughts on how a fix for this would work? Maybe poll the endpoint with an exponential back-off?

fooishbar commented 4 years ago

Yeah, I think that's the one. Maybe linear rather than exponential, since it certainly should complete quickly, and if the issue is background jobs (to recheck the MR status) aren't completing quickly, the poll shouldn't impede their progress as they're just a quick read of a Redis-cached property, run separately to the Sidekiq jobs (not necessarily even on the same node).