Closed snim2 closed 4 years ago
Yeah, this looks good to me, but er, indeed it's missing an actual caller for this function.
Line 80 of single_merge_job.py
calls the method...
Yes, of course; I have no idea how I didn't see that. I blame the heatwave here. Sorry.
I haven't seen any failures at all on freedesktop.org since we deployed this. So it looks good and it seems like it works too. Thanks a lot @snim2!
Could we get this merged 😬 ? We have the underlying problem, and it's pretty annoying.
We could really use this change too please :)
Since 0.9.3
was released it would be nice to have this merged into the project so that we don't have to run custom versions of marge-bot.
Hey, thanks for your contribution! This looks good at first glance. I'll check if we need to change something for the batching accordingly and get this merged.
Hey, after look into this and the gitlab doc, I have a question. Is this PR just to fix the case that one MR will actually cannot be merged, but the marge didn't know. So we after we resolve the status, we can fail more explicitly?
Hey, after look into this and the gitlab doc, I have a question. Is this PR just to fix the case that one MR will actually cannot be merged, but the marge didn't know. So we after we resolve the status, we can fail more explicitly?
So, we need this because the API endpoint that tells marge
whether the MR is ready for merge has become asynchronous. Issue #263 quotes the relevant documentation. On my marge-bot
install, I frequently get a branch cannot be merged error, which is fixed just by passing the MR back to marge
, because the aync call didn't yet give the correct result.
Does that help?
Okay, thanks.
I thought the merge_status
was just potentially stale in the API response but won't affect the actual merge action.
But you mean Gitlab will check the merge_status
before it can merge MR.
If that's the case, wait_for_merge_status_to_resolve
should be called before calling Gitlab API to merge MR, which in the batching logic is here https://github.com/smarkets/marge-bot/blob/master/marge/batch_job.py#L349.
After checking https://gitlab.com/gitlab-org/gitlab/-/merge_requests/21026/diffs and https://gitlab.com/gitlab-org/gitlab/-/merge_requests/31890/diffs those 2 MRs that introduce the async merge_status
logic. It seems my original thoughts are correct. It was introduced to speed up the API GET queries but have nothing to do with the actual merge action.
When you call API to merge, it will always sync check the mergeable_state
and mergeability
- https://gitlab.com/gitlab-org/gitlab/-/blob/342e8cf8/lib/api/merge_requests.rb#L442.
Also we didn't meet this issue on Smarkets. Though I don't know why you guys have this issue if those bad cases were not just legit merge conflicts. Thoughts?
I'm not sure why you haven't seen those errors, that's interesting, and I can't think of a particular reason why we did have them, but maybe someone else on this PR has some ideas?
Sorry I can't add more actionable information but I can corroborate what @snim2 and others in this PR have experienced. At times marge-bot
will fail to merge and a re-assignment resolves it suggesting the async behaviour mentioned. Since using a build of this branch of marge-bot
I have not had that issue occur again.
Okay, I can merge this PR as it will fix your issue. But I think the issue is not caused by the async merge_status
in the API response. It might be caused by something else, for example the async refresh service etc, that we are not sure about. And somehow Smarkets didn't encounter this issue.
I'll add some comments in the code to be aware in the future.
It's definitely caused by the async merge_status
; I was able to verify that MRs which could have been mergeable if we'd checked, were being rejected as non-mergeable, because the status was cached in Redis and not updated after a successful pipeline.
@fooishbar But I think the status cached in redis won't affect the actual merge action. It's just stale in the GET API response which is ok. When you try to merge it, gitlab will always check the mergeability synchronously. You can check my previous comments.
I was able to verify that MRs which could have been mergeable if we'd checked, were being rejected as non-mergeable
This can be true, even the root cause is not the async merge_status
At least from my observation, it did affect the merge action as well as the observed status. The other subtlety is that even if the merge action did bypass Redis and recalculate the status, job/pipeline completion is not synchronous wrt updating the MR status. A pipeline-status job gets queued in Sidekiq which will only later update the MR status, so even if the merge POST did always invalidate the status and recalculate, it is not mergeable until the Sidekiq job retires, so you need to poll anyway.
Closes #263
This is a first go at a linear back-off to check whether an MR has
merge_status
set tocan_be_merged
(see the discussion on the #263).This probably needs quite a bit more work, in particular I'm aware that I haven't made any changes to
marge/batch_job.py
or added any tests. I was a bit confused by how mocking is implemented, so I'd be grateful for some pointers if you'd like more tests.