jenkinsci / kubernetes-operator

Kubernetes native Jenkins Operator
https://jenkinsci.github.io/kubernetes-operator
Other
599 stars 236 forks source link

Allow restore process to finish before starting Jenkins #843

Closed bentlema closed 1 year ago

bentlema commented 1 year ago

Describe the bug When using multibranch pipelines (GitHub Branch Source plugin), upon a Jenkins restart, all branches, PRs, and tags are rebuilt. This appears to be because the restore process hasn't finished restoring all build history in time. This is a race condition, and it causes build history to get overwritten, as well as a massive spike in resource usage as (In our case) hundreds of build jobs are being kicked off simultaneously.

One possible solution would be to move the restore process to an initContainer to guarantee it finishes before the jenkins-master container starts.

To Reproduce With the GitHub Branch Source plugin installed, and a multibranch pipeline configured to build on branches, PRs, and/or tags, execute several builds. Observe the state of the build history. Restart Jenkins (kill the pod), and observe when Jenkins comes back up it will re-build every branch, PR, and tag, at the same time old build history is still "flowing in" via the restore process. (May be hard to repro with a small test case. We have dozens of multibranch pipelines, and hundreds of branches/PRs/tags)

Additional information

Kubernetes version: 1.23 (AWS EKS)

Jenkins Operator version: v0.7.1 and v0.8.0-beta

This same issue was reported in https://github.com/jenkinsci/kubernetes-operator/issues/679 , but was closed as stale.

brokenpip3 commented 1 year ago

yes that issue should not be closed as stale. To solve this we need or the initcontainer like I already commented in the old issue or move the restore before the creation of the seed-job-init in the reconciliation loop. I will try to do the second one but at this moment I still need to fix a couple of things for the 0.8 and finish the golang and operator-sdk migration to the newest version to start doing huge code changes (more info here). Unfortunately this project has been abandoned for a while so we may need time to recover and start be in track. In the mean time you can try this ugly yet working workaround: https://github.com/jenkinsci/kubernetes-operator/issues/679#issuecomment-1573907983

brokenpip3 commented 1 year ago

@bentlema can you try this operator version:

quay.io/jenkins-kubernetes-operator/operator:d9ea2ee

and let me know? thanks!

I tried the quick fix I suggest before: move the restore before the seed job creation in the user reconcile loop

bentlema commented 1 year ago

@bentlema can you try this operator version:

quay.io/jenkins-kubernetes-operator/operator:d9ea2ee

and let me know? thanks!

I tried the quick fix I suggest before: move the restore before the seed job creation in the user reconcile loop

@brokenpip3, yes, this does appear to fix it! After upgrading to this image, and restarting a couple times, it does appear that the restore process is finishing before the seed jobs are executed. The logs seem to confirm this as well:

jenkins-jenkins-operator-7547c95d55-9p8nt jenkins-operator 2023-06-06T06:22:22.942Z INFO  controller-jenkins  Restoring backup '368'  {"cr": "jenkins"}
jenkins-jenkins-operator-7547c95d55-9p8nt jenkins-operator 2023-06-06T06:22:44.197Z INFO  controller-jenkins  Restoring backup '368'  {"cr": "jenkins"}
.
.
.
jenkins-jenkins-operator-7547c95d55-9p8nt jenkins-operator 2023-06-06T06:23:21.924Z INFO  controller-jenkins  Waiting for Seed Job Agent `seed-job-agent`...  {"cr": "jenkins"}
jenkins-jenkins-operator-7547c95d55-9p8nt jenkins-operator 2023-06-06T06:23:26.450Z INFO  controller-jenkins  Waiting for Seed Job Agent `seed-job-agent`...  {"cr": "jenkins"}
brokenpip3 commented 1 year ago

Cool! I'm glad that worked :)

Will maintain the issue open until the new version that contain the fix will be released

brokenpip3 commented 1 year ago

fixed in this version https://github.com/jenkinsci/kubernetes-operator/releases/tag/v0.8.0-beta2