smarkets / marge-bot

A merge-bot for GitLab
BSD 3-Clause "New" or "Revised" License
701 stars 136 forks source link

Wait for merge status to resolve. #265

Closed snim2 closed 4 years ago

snim2 commented 4 years ago

Closes #263

This is a first go at a linear back-off to check whether an MR has merge_status set to can_be_merged (see the discussion on the #263).

This probably needs quite a bit more work, in particular I'm aware that I haven't made any changes to marge/batch_job.py or added any tests. I was a bit confused by how mocking is implemented, so I'd be grateful for some pointers if you'd like more tests.

fooishbar commented 4 years ago

Yeah, this looks good to me, but er, indeed it's missing an actual caller for this function.

snim2 commented 4 years ago

Line 80 of single_merge_job.py calls the method...

fooishbar commented 4 years ago

Yes, of course; I have no idea how I didn't see that. I blame the heatwave here. Sorry.

fooishbar commented 4 years ago

I haven't seen any failures at all on freedesktop.org since we deployed this. So it looks good and it seems like it works too. Thanks a lot @snim2!

anpr commented 4 years ago

Could we get this merged 😬 ? We have the underlying problem, and it's pretty annoying.

davidlivingrooms commented 4 years ago

We could really use this change too please :)

sysadmiral commented 4 years ago

Since 0.9.3 was released it would be nice to have this merged into the project so that we don't have to run custom versions of marge-bot.

tclh123 commented 4 years ago

Hey, thanks for your contribution! This looks good at first glance. I'll check if we need to change something for the batching accordingly and get this merged.

tclh123 commented 4 years ago

Hey, after look into this and the gitlab doc, I have a question. Is this PR just to fix the case that one MR will actually cannot be merged, but the marge didn't know. So we after we resolve the status, we can fail more explicitly?

snim2 commented 4 years ago

Hey, after look into this and the gitlab doc, I have a question. Is this PR just to fix the case that one MR will actually cannot be merged, but the marge didn't know. So we after we resolve the status, we can fail more explicitly?

So, we need this because the API endpoint that tells marge whether the MR is ready for merge has become asynchronous. Issue #263 quotes the relevant documentation. On my marge-bot install, I frequently get a branch cannot be merged error, which is fixed just by passing the MR back to marge, because the aync call didn't yet give the correct result.

Does that help?

tclh123 commented 4 years ago

Okay, thanks. I thought the merge_status was just potentially stale in the API response but won't affect the actual merge action. But you mean Gitlab will check the merge_status before it can merge MR.

If that's the case, wait_for_merge_status_to_resolve should be called before calling Gitlab API to merge MR, which in the batching logic is here https://github.com/smarkets/marge-bot/blob/master/marge/batch_job.py#L349.

tclh123 commented 4 years ago

After checking https://gitlab.com/gitlab-org/gitlab/-/merge_requests/21026/diffs and https://gitlab.com/gitlab-org/gitlab/-/merge_requests/31890/diffs those 2 MRs that introduce the async merge_status logic. It seems my original thoughts are correct. It was introduced to speed up the API GET queries but have nothing to do with the actual merge action.

When you call API to merge, it will always sync check the mergeable_state and mergeability - https://gitlab.com/gitlab-org/gitlab/-/blob/342e8cf8/lib/api/merge_requests.rb#L442.

Also we didn't meet this issue on Smarkets. Though I don't know why you guys have this issue if those bad cases were not just legit merge conflicts. Thoughts?

snim2 commented 4 years ago

I'm not sure why you haven't seen those errors, that's interesting, and I can't think of a particular reason why we did have them, but maybe someone else on this PR has some ideas?

sysadmiral commented 4 years ago

Sorry I can't add more actionable information but I can corroborate what @snim2 and others in this PR have experienced. At times marge-bot will fail to merge and a re-assignment resolves it suggesting the async behaviour mentioned. Since using a build of this branch of marge-bot I have not had that issue occur again.

tclh123 commented 4 years ago

Okay, I can merge this PR as it will fix your issue. But I think the issue is not caused by the async merge_status in the API response. It might be caused by something else, for example the async refresh service etc, that we are not sure about. And somehow Smarkets didn't encounter this issue.

I'll add some comments in the code to be aware in the future.

fooishbar commented 4 years ago

It's definitely caused by the async merge_status; I was able to verify that MRs which could have been mergeable if we'd checked, were being rejected as non-mergeable, because the status was cached in Redis and not updated after a successful pipeline.

tclh123 commented 4 years ago

@fooishbar But I think the status cached in redis won't affect the actual merge action. It's just stale in the GET API response which is ok. When you try to merge it, gitlab will always check the mergeability synchronously. You can check my previous comments.

I was able to verify that MRs which could have been mergeable if we'd checked, were being rejected as non-mergeable

This can be true, even the root cause is not the async merge_status

fooishbar commented 4 years ago

At least from my observation, it did affect the merge action as well as the observed status. The other subtlety is that even if the merge action did bypass Redis and recalculate the status, job/pipeline completion is not synchronous wrt updating the MR status. A pipeline-status job gets queued in Sidekiq which will only later update the MR status, so even if the merge POST did always invalidate the status and recalculate, it is not mergeable until the Sidekiq job retires, so you need to poll anyway.