nishio-dens / bitbucket-pullrequest-builder-plugin

Bitbucket Pull Request Builder Plugin for Jenkins
Other
125 stars 145 forks source link

Build repeatedly triggers for Open PR where the source branch has been deleted #219

Closed stevemuskiewicz closed 3 years ago

stevemuskiewicz commented 4 years ago

With plugin version 1.4.28, builds are repeatedly triggered for an open PR even though its source branch has been deleted (and obviously the builds all fail)

Probably should be checking for the existence of the source branch before triggering a build.

CodeMonk commented 4 years ago

Could you please re-try with the current release?

On Tue, Mar 17, 2020 at 8:42 AM Steve Muskiewicz notifications@github.com wrote:

With plugin version 1.4.28, builds are repeatedly triggered for an open PR even though its source branch has been deleted (and obviously the builds all fail)

Probably should be checking for the existence of the source branch before triggering a build.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nishio-dens/bitbucket-pullrequest-builder-plugin/issues/219, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA72VW6ISGOZFPKIVFD4SLRH6D5TANCNFSM4LNP7JAQ .

stevemuskiewicz commented 4 years ago

I will attempt if I get a chance, but currently we are forced to run with 1.4.28 due to the unresolved issues reported in https://github.com/nishio-dens/bitbucket-pullrequest-builder-plugin/issues/193

Hard for me to tell if 1.5.0 addressed the above issue or not.

CodeMonk commented 4 years ago

Unreproducible problems without help from the reporter are as likely to be looked at as bug reports on older versions. I know I do not have the time to roll back code somewhere to confirm whether or not a problem is fixed in mainline.

But, I hope either you, or someone else paying attention, has time to help you out. Until then, I doubt much movement will happen.

Just my $.02 worth,

-Dave

On Tue, Mar 17, 2020 at 2:34 PM Steve Muskiewicz notifications@github.com wrote:

I will attempt if I get a chance, but currently we are forced to run with 1.4.28 due to the unresolved issues reported in #193 https://github.com/nishio-dens/bitbucket-pullrequest-builder-plugin/issues/193

Hard for me to tell if 1.5.0 addressed the above issue or not.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nishio-dens/bitbucket-pullrequest-builder-plugin/issues/219#issuecomment-600283754, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA72VWT2OZTIBSSCHMPCKDRH7NF3ANCNFSM4LNP7JAQ .

stevemuskiewicz commented 4 years ago

@CodeMonk so understood and appreciated, however as my previous comment in 193 indicated, I found a pretty specific combination where the issue was reproducing and it seems some other commenter offered some feedback about seeing the issue as well.

I am willing to help reproduce further with the caveats that I am not a Java developer and that I am trying to maintain a sizable production build cluster with limited timeframes for "testing" new plugin versions, so aside from providing logs or attempting other tests, I am kind of at a loss as to how I can debug this further...if you have suggestions as to any specific things I can try, I will go and attempt to do that.

thanks!

CodeMonk commented 4 years ago

Fair enough!

Let me know the exact procedure for reproducing, and I can test on 1.5.0!

-Dave

On Tue, Mar 17, 2020 at 2:52 PM Steve Muskiewicz notifications@github.com wrote:

@CodeMonk https://github.com/CodeMonk so understood and appreciated, however as my previous comment in 193 https://github.com/nishio-dens/bitbucket-pullrequest-builder-plugin/issues/193#issuecomment-475085269 indicated, I found a pretty specific combination where the issue was reproducing and it seems some other commenter offered some feedback about seeing the issue as well.

I am willing to help reproduce further with the caveats that I am not a Java developer and that I am trying to maintain a sizable production build cluster with limited timeframes for "testing" new plugin versions, so aside from providing logs or attempting other tests, I am kind of at a loss as to how I can debug this further...if you have suggestions as to any specific things I can try, I will go and attempt to do that.

thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nishio-dens/bitbucket-pullrequest-builder-plugin/issues/219#issuecomment-600291035, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA72VQ7ZFUYU2TEBRYCVSTRH7PHXANCNFSM4LNP7JAQ .

stevemuskiewicz commented 4 years ago

@CodeMonk thanks!

So for this bug, repro is pretty straightforward, just open a PR in Bitbucket cloud and delete the source branch from bitbucket.org without closing/declining the PR. New builds should get triggered on the PR for every single polling interval of the plugin.

For #193, we have the PR builder configured with "Rebuild if destination branch changes" and "Cancel outdated jobs" enabled in the job config, then we get a particular PR building (our builds take close to an hour) and while that is running, push another commit to the PR branch. What we observe is that the PR builder posts a Bitbucket "in progress" build status on the new commit SHA but it does not cancel the prior (now outdated job) and no new job is triggered against the new commit (the build status in Bitbucket ends up just linking to our toplevel Jenkins URL, not a specific job ID).

Hope that provides enough information, if not please let me know if you are looking for more specifics.

thanks!

CodeMonk commented 4 years ago

Thank you for the mechanism. Going to try to gather some logs today.

CodeMonk commented 4 years ago

Did you delete the branch before the build started?

I deleted after it started, and the build happened to fail, but, it hasn't kicked off again.

CodeMonk commented 4 years ago

Now trying successful build, deleteing branch after the build started

CodeMonk commented 4 years ago

So for this bug, repro is pretty straightforward, just open a PR in Bitbucket cloud and delete the source branch from bitbucket.org without closing/declining the PR. New builds should get triggered on the PR for every single polling interval of the plugin.

I am trying, but I have been unable to reproduce this so far.

For #193, we have the PR builder configured with "Rebuild if destination branch changes" and "Cancel outdated jobs" enabled in the job config, then we get a particular PR building (our builds take close to an hour) and while that is running, push another commit to the PR branch. What we observe is that the PR builder posts a Bitbucket "in progress" build status on the new commit SHA but it does not cancel the prior (now outdated job) and no new job is triggered against the new commit (the build status in Bitbucket ends up just linking to our toplevel Jenkins URL, not a specific job ID).

So, that behavior works perfectly for me. We often push updates to a PR, previous ones are aborted, and a new one starts. However:

  1. Jenkins can sometimes not interrupt an individual instruction. So, if your jenkins file does something like docker run ubuntu sleep 300, then the docker process will usually ignore the ctrl-c (because -i wasn't passed), and it will take up to five minutes for the instruction to run, which delays the abort.
  2. When a build is first triggered, it will report the in process to bitbucket before it has an actual job or executor. While annoying, and a bug, it is low priority to fix, and does not cause much harm.
stevemuskiewicz commented 4 years ago

Did you delete the branch before the build started?

I deleted after it started, and the build happened to fail, but, it hasn't kicked off again.

I'm not certain of the timing, I think the branch/PR were up and ran normally then after some period of time the developer deleted the branch without closing the PR. I think initially it didn't get triggered but at some point (maybe Jenkins master restart or more likely Bitbucket outage) caused the builder to start retriggering PRs at which point it would trigger this PR during every single polling interval.

CodeMonk commented 4 years ago

Still behaving well. Going to try a branch deletion BEFORE the job does checkout scm now.

stevemuskiewicz commented 4 years ago

Still behaving well. Going to try a branch deletion BEFORE the job does checkout scm now.

Have a way to "simulate" a Bitbucket outage or something that seems to cause the builder to retrigger everything? I think that may be the key to repro...

stevemuskiewicz commented 4 years ago

So for this bug, repro is pretty straightforward, just open a PR in Bitbucket cloud and delete the source branch from bitbucket.org without closing/declining the PR. New builds should get triggered on the PR for every single polling interval of the plugin.

I am trying, but I have been unable to reproduce this so far.

For #193, we have the PR builder configured with "Rebuild if destination branch changes" and "Cancel outdated jobs" enabled in the job config, then we get a particular PR building (our builds take close to an hour) and while that is running, push another commit to the PR branch. What we observe is that the PR builder posts a Bitbucket "in progress" build status on the new commit SHA but it does not cancel the prior (now outdated job) and no new job is triggered against the new commit (the build status in Bitbucket ends up just linking to our toplevel Jenkins URL, not a specific job ID).

So, that behavior works perfectly for me. We often push updates to a PR, previous ones are aborted, and a new one starts. However:

  1. Jenkins can sometimes not interrupt an individual instruction. So, if your jenkins file does something like docker run ubuntu sleep 300, then the docker process will usually ignore the ctrl-c (because -i wasn't passed), and it will take up to five minutes for the instruction to run, which delays the abort.
  2. When a build is first triggered, it will report the in process to bitbucket before it has an actual job or executor. While annoying, and a bug, it is low priority to fix, and does not cause much harm.

Fair enough, our build is docker based and we do tend to have some issues with aborted builds leaving containers around, however on 1.4.28 it definitely still seems to abort the build as expected and retrigger another within a 5-10 minute span but with 1.4.30 this was definitely not what we were seeing. As I said, unfortunately don't have much time to experiment, but I will check with our devs and see if maybe this weekend I can try 1.5.0 and see if I can reproduce a specific condition like this. It did seem to be mostly working it was just the aborting the outdated builds that wasn't working entirely as with 1.4.28 for us but that is a common pattern with our PR's so really need that feature to work reliably so we don't end up testing outdated builds.

Thanks for your help and efforts with this one!

CodeMonk commented 4 years ago

Got the build to fail because checkout_scm failed (because branch was deleted).

So far, no extra builds kicked off.

I'm trying a restart of jenkins next. If a restart doesn't make this do the loop, I'm going to have to ask for your logs.

CodeMonk commented 4 years ago

Ok - I have been completely unable to reproduce this.

Do you have any other triggers you are using (other than the BBPRB kicking it off)?

I am ONLY able to get repeated builds for each change I push, or the comment trigger. Deleting the branch did not cause it to re-build, and definitely doesn't cause things to build multiple times. The result of the build, failed or not, just goes to the PR.

Can you please attach logs from a failure? From a cycle of two rebuilds? On 1.5.0 preferably?

CodeMonk commented 4 years ago

Actually, go ahead and send me logs from your version. If I can find the root cause in any version, then I should be able to either reproduce on 1.5.0, or confirm whether or not the bug was fixed.

Log level FINE, if possible.

stevemuskiewicz commented 4 years ago

for now all I have is this (default log level), just repeats over and over for each BBPRB polling interval

2020-03-16 01:06:19.018+0000 [id=43] WARNING b.b.bitbucket.ApiClient#send: Response status: HTTP/1.1 404 Not Found | URI: https://bitbucket.org/api/2.0/repositories/plexxi/connect/pullrequests/7517/approve | Response body: {"type": "error", "error": {"message": "You haven't approved this pull request."}} 2020-03-16 01:06:19.238+0000 [id=165962] WARNING b.b.bitbucket.ApiClient#send: Response status: HTTP/1.1 404 Not Found | URI: https://bitbucket.org/api/2.0/repositories/plexxi/connect/commit/1d031bc435b9/statuses/build | Response body: {"data": {"shas": ["1d031bc435b9"]}, "type": "error", "error": {"message": "Commit not found", "data": {"shas": ["1d031bc435b9"]}}} 2020-03-16 01:07:08.357+0000 [id=165962] WARNING b.b.bitbucket.ApiClient#send: Response status: HTTP/1.1 404 Not Found | URI: https://bitbucket.org/api/2.0/repositories/plexxi/connect/commit/1d031bc435b9/statuses/build | Response body: {"data": {"shas": ["1d031bc435b9"]}, "type": "error", "error": {"message": "Commit not found", "data": {"shas": ["1d031bc435b9"]}}} 2020-03-16 01:07:08.363+0000 [id=165962] INFO j.p.s.l.SlackNotificationsLogger#info: [CFM - Pull Request #12291] found #12290 as previous completed, non-aborted build 2020-03-16 01:08:02.300+0000 [id=41] WARNING b.b.bitbucket.ApiClient#send: Response status: HTTP/1.1 404 Not Found | URI: https://bitbucket.org/api/2.0/repositories/plexxi/connect/commit/1d031bc435b9/statuses/build/jenkins-eff682f47713b6347ce8c9b8a172c1d9 | Response body: {"data": {"shas": ["1d031bc435b9"]}, "type": "error", "error": {"message": "Commit not found", "data": {"shas": ["1d031bc435b9"]}}} 2020-03-16 01:08:18.660+0000 [id=41] WARNING b.b.bitbucket.ApiClient#send: Response status: HTTP/1.1 404 Not Found | URI: https://bitbucket.org/api/2.0/repositories/plexxi/connect/pullrequests/7517/approve | Response body: {"type": "error", "error": {"message": "You haven't approved this pull request."}} 2020-03-16 01:08:18.864+0000 [id=165983] WARNING b.b.bitbucket.ApiClient#send: Response status: HTTP/1.1 404 Not Found | URI: https://bitbucket.org/api/2.0/repositories/plexxi/connect/commit/1d031bc435b9/statuses/build | Response body: {"data": {"shas": ["1d031bc435b9"]}, "type": "error", "error": {"message": "Commit not found", "data": {"shas": ["1d031bc435b9"]}}} 2020-03-16 01:09:03.841+0000 [id=165983] WARNING b.b.bitbucket.ApiClient#send: Response status: HTTP/1.1 404 Not Found | URI: https://bitbucket.org/api/2.0/repositories/plexxi/connect/commit/1d031bc435b9/statuses/build | Response body: {"data": {"shas": ["1d031bc435b9"]}, "type": "error", "error": {"message": "Commit not found", "data": {"shas": ["1d031bc435b9"]}}} 2020-03-16 01:09:03.848+0000 [id=165983] INFO j.p.s.l.SlackNotificationsLogger#info: [CFM - Pull Request #12292] found #12291 as previous completed, non-aborted build

Probably not much to go on I'm sure but that's all we have at the moment. If I can attempt to repro this weekend will try to crank up the logging level with it.

stevemuskiewicz commented 4 years ago

one other piece of info, looking at the BB PR there was a prior successful build logged at commit ID 1d031bc435b9

CodeMonk commented 4 years ago

You never mentioned that you have it set to approve the PR. Checking the approval code.

CodeMonk commented 4 years ago

Try using the GUI and set logging to FINE for bbprb and see if you can get more logs from the console. If the logs above were from the gui, try to also get me the relevant lines from /var/log/jenkins/jenkins.log

I still don't see in the code how that could ever happen.

CodeMonk commented 4 years ago

Did you ever manage to get FINE logs?

stevemuskiewicz commented 4 years ago

not yet, I'm just going to spin up another jenkins instance and then try to repro using a test forked repo so I can do this at my own pace rather than wait for windows when our jenkins cluster isn't in use by our dev team (which aren't very frequent anymore). Hopefully I can do this in the next couple of days, sorry for the delay on this.

stevemuskiewicz commented 3 years ago

This does appear to be resolved in 1.5.0

(of course now we are running into #193 which is more problematic for us than this issue so I'll probably need to downgrade again due to that issue...)