jenkinsci / lockable-resources-plugin

Lock resources against concurrent use
https://plugins.jenkins.io/lockable-resources
MIT License
87 stars 185 forks source link

jenkins startup failure #623

Closed clockrun closed 9 months ago

clockrun commented 10 months ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.426.2 OS: Linux - 5.15.125-gardenlinux-cloud-amd64 Java: 21.0.1 - Eclipse Adoptium (OpenJDK 64-Bit Server VM) --- Office-365-Connector:4.20.2 Parameterized-Remote-Trigger:3.2.0 ace-editor:1.1 active-directory:2.34 allure-jenkins-plugin:2.31.1 analysis-model-api:11.14.0 ansicolor:1.0.4 ant:497.v94e7d9fffa_b_9 antisamy-markup-formatter:162.v0e6ec0fcfcf6 apache-httpcomponents-client-4-api:4.5.14-208.v438351942757 apache-httpcomponents-client-5-api:5.2.3-1.0 archived-artifact-url-viewer:1.1 authentication-tokens:1.53.v1c90fd9191a_b_ aws-credentials:218.v1b_e9466ec5da_ aws-java-sdk:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-cloudformation:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-codebuild:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-ec2:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-ecr:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-ecs:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-efs:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-elasticbeanstalk:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-iam:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-kinesis:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-logs:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-minimal:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-secretsmanager:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-sns:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-sqs:1.12.610-428.v849169a_01b_a_5 aws-java-sdk-ssm:1.12.610-428.v849169a_01b_a_5 azure-commons:1.1.3 badge:1.9.1 basic-branch-build-strategies:81.v05e333931c7d blackduck-detect:9.0.0 blueocean:1.27.9 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.27.9 blueocean-commons:1.27.9 blueocean-config:1.27.9 blueocean-core-js:1.27.9 blueocean-dashboard:1.27.9 blueocean-display-url:2.4.2 blueocean-events:1.27.9 blueocean-git-pipeline:1.27.9 blueocean-github-pipeline:1.27.9 blueocean-i18n:1.27.9 blueocean-jwt:1.27.9 blueocean-personalization:1.27.9 blueocean-pipeline-api-impl:1.27.9 blueocean-pipeline-editor:1.27.9 blueocean-pipeline-scm-api:1.27.9 blueocean-rest:1.27.9 blueocean-rest-impl:1.27.9 blueocean-web:1.27.9 bootstrap4-api:4.6.0-6 bootstrap5-api:5.3.2-3 bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9 branch-api:2.1135.v8de8e7899051 build-failure-analyzer:2.5.0 build-history-manager:1.7.1 build-timeout:1.31 build-user-vars-plugin:1.9 caffeine-api:3.1.8-133.v17b_1ff2e0599 checkmarx:2023.2.6 checks-api:2.0.2 cloudbees-bitbucket-branch-source:856.v04c46c86f911 cloudbees-folder:6.858.v898218f3609d cobertura:1.17 code-coverage-api:4.99.0 command-launcher:107.v773860566e2e commons-httpclient3-api:3.1-3 commons-lang3-api:3.13.0-62.v7d18e55f51e2 commons-text-api:1.11.0-95.v22a_d30ee5d36 concurrent-step:1.0.0 config-file-provider:959.vcff671a_4518b_ configuration-as-code:1763.vb_fe9c1b_83f7b copyartifact:722.v0662a_9b_e22a_c coverage:1.7.0 credentials:1311.vcf0a_900b_37c2 credentials-binding:642.v737c34dea_6c2 cucumber-reports:5.8.1 dashboard-view:2.495.v07e81500c3f2 data-tables-api:1.13.8-2 database:191.vd5981b_97a_5fa_ database-postgresql:100.v2418e0a_c6909 dependency-check-jenkins-plugin:5.4.3 dependency-track:4.3.1 disk-usage:1.2 display-url-api:2.200.vb_9327d658781 docker-commons:439.va_3cb_0a_6a_fb_29 docker-java-api:3.3.4-86.v39b_a_5ede342c docker-workflow:572.v950f58993843 durable-task:523.va_a_22cf15d5e0 echarts-api:5.4.3-2 email-ext:2.102 emailext-template:1.5 embeddable-build-status:467.v4a_954796e45d extended-choice-parameter:376.v2e02857547b_a_ extended-read-permission:53.v6499940139e5 extensible-choice-parameter:1.8.1 external-monitor-job:215.v2e88e894db_f8 favorite:2.208.v91d65b_7792a_c font-awesome-api:6.5.1-1 forensics-api:2.3.0 generic-webhook-trigger:1.88.2 ghprb:1.42.2 git:5.2.1 git-client:4.6.0 git-server:99.va_0826a_b_cdfa_d github:1.37.3.1 github-api:1.318-461.v7a_c09c9fa_d63 github-branch-source:1767.va_7d01ea_c7256 github-oauth:597.ve0c3480fcb_d0 google-oauth-plugin:1.330.vf5e86021cb_ec gradle:2.9 groovy-postbuild:228.vcdb_cf7265066 gson-api:2.10.1-3.vb_25b_599b_e4f8 h2-api:11.1.4.199-12.v9f4244395f7a_ handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-30.v7e777411b_148 htmlpublisher:1.32 http_request:1.18 influxdb:3.6 instance-identity:185.v303dc7c645f9 ionicons-api:56.v1b_1c8c49374e jackson2-api:2.15.3-372.v309620682326 jacoco:3.3.5 jakarta-activation-api:2.0.1-3 jakarta-mail-api:2.0.1-3 javadoc:243.vb_b_503b_b_45537 javax-activation-api:1.2.0-6 javax-mail-api:1.6.2-9 jaxb:2.3.9-1 jdk-tool:73.vddf737284550 jenkins-design-language:1.27.9 jersey2-api:2.41-133.va_03323b_a_1396 jira:3.12 jjwt-api:0.11.5-77.v646c772fddb_0 jnr-posix-api:3.1.18-1 job-dsl:1.87 job-import-plugin:3.6 joda-time-api:2.12.5-5.v5495a_235fedf jquery:1.12.4-1 jquery-detached:1.2.1 jquery3-api:3.7.1-1 jsch:0.2.8-65.v052c39de79b_2 json-api:20231013-3.v20f3c247f2fe json-path-api:2.8.0-5.v07cb_a_1ca_738c junit:1252.vfc2e5efa_294f kubernetes:4151.v6fa_f0fb_0b_4c9 kubernetes-cd:2.3.1 kubernetes-cli:1.12.1 kubernetes-client-api:6.8.1-224.vd388fca_4db_3b_ kubernetes-credentials:0.11 ldap:711.vb_d1a_491714dc mailer:463.vedf8358e006b_ mask-passwords:173.v6a_077a_291eb_5 matrix-auth:3.2.1 matrix-project:822.v01b_8c85d16d2 maven-plugin:3.23 mercurial:1260.vdfb_723cdcc81 metrics:4.2.18-442.v02e107157925 mina-sshd-api-common:2.11.0-86.v836f585d47fa_ mina-sshd-api-core:2.11.0-86.v836f585d47fa_ momentjs:1.1.1 next-build-number:1.8 nodejs:1.6.1 oauth-credentials:0.646.v02b_66dc03d2e okhttp-api:4.11.0-157.v6852a_a_fa_ec11 openshift-client:1.1.0.420.v8817277c1990 openshift-sync:1.1.0.795.v95fa_27a_a_e287 pam-auth:1.10 parameterized-scheduler:255.v73827fcdf618 performance:951.v5600a_c6422ed permissive-script-security:0.7 pipeline-aws:1.43 pipeline-build-step:540.vb_e8849e1a_b_d8 pipeline-github:2.8-155.8eab375ac9f8 pipeline-github-lib:42.v0739460cda_c4 pipeline-githubnotify-step:49.vf37bf92d2bc8 pipeline-graph-analysis:202.va_d268e64deb_3 pipeline-groovy-lib:689.veec561a_dee13 pipeline-input-step:477.v339683a_8d55e pipeline-maven:1362.vee39a_d4b_02b_1 pipeline-maven-api:1362.vee39a_d4b_02b_1 pipeline-milestone-step:111.v449306f708b_7 pipeline-model-api:2.2151.ve32c9d209a_3f pipeline-model-definition:2.2151.ve32c9d209a_3f pipeline-model-extensions:2.2151.ve32c9d209a_3f pipeline-npm:155.ve2e393062d86 pipeline-rest-api:2.34 pipeline-stage-step:305.ve96d0205c1c6 pipeline-stage-tags-metadata:2.2151.ve32c9d209a_3f pipeline-stage-view:2.34 pipeline-utility-steps:2.16.0 plain-credentials:143.v1b_df8b_d3b_e48 plugin-util-api:3.8.0 popper-api:1.16.1-3 popper2-api:2.11.6-4 postgresql-api:42.6.0-31.vb_7e76dc13969 prism-api:1.29.0-10 pubsub-light:1.18 pyenv-pipeline:2.1.2 rebuild:330.v645b_7df10e2a_ resource-disposer:0.23 robot:3.4.0 saferestart:0.7 saml:4.429.v9a_781a_61f1da_ sapsystemhealthcheck:1.23-SNAPSHOT (private-8489f1a4-I335255) scm-api:683.vb_16722fb_b_80b_ script-security:1294.v99333c047434 seleniumhtmlreport:1.1 shiningpanda:0.24 sidebar-link:2.4.1 slack:684.v833089650554 snakeyaml-api:2.2-111.vc6598e30cc65 soapui-pro-functional-testing:1.10 sonar:2.16.1 sse-gateway:1.26 ssh-agent:346.vda_a_c4f2c8e50 ssh-credentials:308.ve4497b_ccd8f4 ssh-slaves:2.947.v64ee6b_f87b_c1 sshd:3.312.v1c601b_c83b_0e structs:325.vcb_307d2a_2782 timestamper:1.26 token-macro:400.v35420b_922dcb_ trilead-api:2.133.vfb_8a_7b_9c5dd1 uno-choice:2.8.1 variant:60.v7290fc0eb_b_cd warnings-ng:10.5.3 webhook-step:327.vb_58f208a_7783 windows-slaves:1.8.1 workflow-aggregator:596.v8c21c963d92d workflow-api:1283.v99c10937efcb_ workflow-basic-steps:1042.ve7b_140c4a_e0c workflow-cps:3826.v3b_5707fe44da_ workflow-cps-global-lib:609.vd95673f149b_b workflow-durable-task-step:1313.vcb_970b_d2a_fb_3 workflow-job:1385.vb_58b_86ea_fff1 workflow-multibranch:756.v891d88f2cd46 workflow-scm-step:415.v434365564324 workflow-step-api:639.v6eca_cd8c04a_a_ workflow-support:865.v43e78cc44e0d ws-cleanup:0.45 ```

What Operating System are you using (both controller, and any agents involved in the problem)?

jenkins running inside kubernetes

Reproduction steps

have some job running, restart jenkins

Expected Results

Jenkins can startup

Actual Results

Jenkins failed at startup. Showing below error message on web page

java.util.ConcurrentModificationException
    at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1095)
    at java.base/java.util.ArrayList$Itr.next(ArrayList.java:1049)
    at org.jenkins.plugins.lockableresources.FreeDeadJobs.freePostMortemResources(FreeDeadJobs.java:33)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
Caused: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:118)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:109)
Caused: java.lang.Error
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:115)
    at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:185)
    at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
    at jenkins.model.Jenkins$5.runTask(Jenkins.java:1170)
    at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221)
    at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused: org.jvnet.hudson.reactor.ReactorException
    at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:290)
    at jenkins.InitReactorRunner.run(InitReactorRunner.java:49)
    at jenkins.model.Jenkins.executeReactor(Jenkins.java:1205)
    at jenkins.model.Jenkins.<init>(Jenkins.java:992)
    at hudson.model.Hudson.<init>(Hudson.java:86)
    at hudson.model.Hudson.<init>(Hudson.java:82)
    at hudson.WebAppMain$3.run(WebAppMain.java:247)
Caused: hudson.util.HudsonFailedToLoad
    at hudson.WebAppMain$3.run(WebAppMain.java:264)

Anything else?

No response

Are you interested in contributing a fix?

No response

mPokornyETM commented 10 months ago

Which version do you use? Try to update to 1232.v512d6c434eb_d thx

ckullabosch commented 10 months ago

I see the following callstack with version 1232 of the plugin:

java.util.ConcurrentModificationException
    at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1043)
    at java.base/java.util.ArrayList$Itr.next(ArrayList.java:997)
    at org.jenkins.plugins.lockableresources.FreeDeadJobs.freePostMortemResources(FreeDeadJobs.java:33)
Caused: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:109)
Caused: java.lang.Error
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:115)
    at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:185)
    at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
    at jenkins.model.Jenkins$5.runTask(Jenkins.java:1170)
    at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221)
    at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused: org.jvnet.hudson.reactor.ReactorException
    at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:290)
    at jenkins.InitReactorRunner.run(InitReactorRunner.java:49)
    at jenkins.model.Jenkins.executeReactor(Jenkins.java:1205)
    at jenkins.model.Jenkins.<init>(Jenkins.java:992)
    at hudson.model.Hudson.<init>(Hudson.java:86)
    at hudson.model.Hudson.<init>(Hudson.java:82)
    at hudson.WebAppMain$3.run(WebAppMain.java:247)
Caused: hudson.util.HudsonFailedToLoad
    at hudson.WebAppMain$3.run(WebAppMain.java:264)
mPokornyETM commented 10 months ago

can you pls active the finest log level for ALL lockbale-resources classes and paste the result here pls. In case there are sensible information, we can found other way. This is really tricky, because the stack says, where it happens, but does not say who change the array. thx

clockrun commented 10 months ago

I think this is because

  1. I'm using old version of plugin (version 1185)
  2. I upgraded jenkins to 2.426.2 LTS
  3. Running jenkins jobs, and jenkins crashed
  4. Jenkins startup failed, and went into crash loop

After that, I have tried (in sequence)

  1. update plugin to 1232, jenkins startup failed
  2. update plugin to 1236, jenkins startup failed
  3. manually delete lock in xml file, works

Now since the lock was deleted and plugin updated to latest, I cannot reproduce this issue again. Unfortunately I have deleted the xml configuration file, so I cannot upload here to help you. But I think this issue is not 100% fixed as of now.

mPokornyETM commented 10 months ago

1232.v512d6c434eb_d is the current (last) release. What you mean with 1236 ?

But I will keep this item open and waiting for more feadback

clockrun commented 10 months ago

You haven't release version 1236, but there was a build number 1236 on Jenkins build server. That was the latest version I can find :)

mPokornyETM commented 9 months ago

Do you use some groovy code to access/modify resources?

Udo10744 commented 9 months ago

We are faced with the same issue, using Version 1232.v512d6c434eb_d. I am posting our details hopefully it helps

Environment Jenkins: 2.401.3 OS: Windows Server 2016 - 10.0 Java: 11.0.14 - Eclipse Adoptium (OpenJDK 64-Bit Server VM) --- Surround-SCM-plugin:1.12 additional-identities-plugin:1.1 analysis-model-api:11.15.0 antisamy-markup-formatter:162.v0e6ec0fcfcf6 apache-httpcomponents-client-4-api:4.5.14-208.v438351942757 audit-trail:361.v82cde86c784e authentication-tokens:1.53.v1c90fd9191a_b_ bootstrap5-api:5.3.2-3 bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9 branch-api:2.1128.v717130d4f816 build-name-setter:2.4.1 build-timeout:1.32 caffeine-api:3.1.8-133.v17b_1ff2e0599 checks-api:2.0.2 cloudbees-bitbucket-branch-source:866.vdea_7dcd3008e cloudbees-folder:6.858.v898218f3609d command-launcher:107.v773860566e2e commons-lang3-api:3.13.0-62.v7d18e55f51e2 commons-text-api:1.11.0-95.v22a_d30ee5d36 copyartifact:722.v0662a_9b_e22a_c credentials:1319.v7eb_51b_3a_c97b_ credentials-binding:642.v737c34dea_6c2 dashboard-view:2.495.v07e81500c3f2 data-tables-api:1.13.6-5 display-url-api:2.200.vb_9327d658781 docker-commons:439.va_3cb_0a_6a_fb_29 dtkit-api:3.0.2 durable-task:547.vd1ea_007d100c echarts-api:5.4.0-7 email-ext:2.104 external-monitor-job:215.v2e88e894db_f8 font-awesome-api:6.5.1-2 forensics-api:2.3.0 generic-webhook-trigger:2.0.0 git:5.2.1 git-client:4.6.0 gson-api:2.10.1-15.v0d99f670e0a_7 handy-uri-templates-2-api:2.1.8-30.v7e777411b_148 instance-identity:185.v303dc7c645f9 ionicons-api:56.v1b_1c8c49374e jackson2-api:2.16.1-373.ve709c6871598 jakarta-activation-api:2.0.1-3 jakarta-mail-api:2.0.1-3 javadoc:243.vb_b_503b_b_45537 javax-activation-api:1.2.0-6 javax-mail-api:1.6.2-9 jaxb:2.3.9-1 jdk-tool:73.vddf737284550 job-restrictions:0.8 joda-time-api:2.12.6-21.vca_fd74418fb_7 jquery:1.12.4-1 jquery3-api:3.7.1-1 jsch:0.2.16-86.v42e010d9484b_ json-path-api:2.9.0-33.v2527142f2e1d junit:1259.v65ffcef24a_88 ldap:711.vb_d1a_491714dc locale:314.v22ce953dfe9e lockable-resources:1232.v512d6c434eb_d mailer:463.vedf8358e006b_ mapdb-api:1.0.9-28.vf251ce40855d mathworks-polyspace:1.0.5 matrix-auth:3.2.1 matrix-project:822.824.v14451b_c0fd42 mina-sshd-api-common:2.12.0-90.v9f7fb_9fa_3d3b_ mina-sshd-api-core:2.12.0-90.v9f7fb_9fa_3d3b_ monitoring:1.95.0 pipeline-build-step:540.vb_e8849e1a_b_d8 pipeline-graph-analysis:202.va_d268e64deb_3 pipeline-groovy-lib:689.veec561a_dee13 pipeline-input-step:477.v339683a_8d55e pipeline-milestone-step:111.v449306f708b_7 pipeline-model-api:2.2150.v4cfd8916915c pipeline-model-definition:2.2150.v4cfd8916915c pipeline-model-extensions:2.2150.v4cfd8916915c pipeline-rest-api:2.34 pipeline-stage-step:305.ve96d0205c1c6 pipeline-stage-tags-metadata:2.2150.v4cfd8916915c pipeline-stage-view:2.34 plain-credentials:143.v1b_df8b_d3b_e48 plugin-util-api:3.8.0 prism-api:1.29.0-8 resource-disposer:0.23 role-strategy:633.v836e5b_3e80a_5 scm-api:676.v886669a_199a_a_ script-security:1321.va_73c0795b_923 snakeyaml-api:2.2-111.vc6598e30cc65 ssh-credentials:308.ve4497b_ccd8f4 sshd:3.322.v159e91f6a_550 structs:325.vcb_307d2a_2782 timestamper:1.26 token-macro:400.v35420b_922dcb_ trilead-api:2.133.vfb_8a_7b_9c5dd1 validating-string-parameter:183.v3748e79b_9737 variant:60.v7290fc0eb_b_cd versioncolumn:233.v2d198f8212a_2 warnings-ng:10.7.0 workflow-aggregator:596.v8c21c963d92d workflow-api:1291.v51fd2a_625da_7 workflow-basic-steps:1042.ve7b_140c4a_e0c workflow-cps:3837.v305192405b_c0 workflow-durable-task-step:1327.ve57634fb_09ce workflow-job:1326.ve643e00e9220 workflow-multibranch:770.v1a_d0708dd1f6 workflow-scm-step:415.v434365564324 workflow-step-api:639.v6eca_cd8c04a_a_ workflow-support:865.v43e78cc44e0d ws-cleanup:0.45 xunit:3.1.3

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller and agents running on MS Windows

Reproduction steps

have a job running, lock a ressource, restart jenkins service

Expected Results Jenkins can startup

Actual Results Jenkins failed at startup. Showing below error message

2024-02-05 04:02:18.931+0000 [id=71]    SEVERE  jenkins.InitReactorRunner$1#onTaskFailed: Failed FreeDeadJobs.freePostMortemResources

java.util.ConcurrentModificationException
    at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1043)
    at java.base/java.util.ArrayList$Itr.next(ArrayList.java:997)
    at org.jenkins.plugins.lockableresources.FreeDeadJobs.freePostMortemResources(FreeDeadJobs.java:33)
Caused: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:109)
Caused: java.lang.Error
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:115)
    at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:185)
    at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
    at jenkins.model.Jenkins$5.runTask(Jenkins.java:1164)
    at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221)
    at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

2024-02-05 04:02:18.933+0000 [id=24]    SEVERE  hudson.util.BootFailure#publish: Failed to initialize Jenkins
java.util.ConcurrentModificationException
    at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1043)
    at java.base/java.util.ArrayList$Itr.next(ArrayList.java:997)
    at org.jenkins.plugins.lockableresources.FreeDeadJobs.freePostMortemResources(FreeDeadJobs.java:33)
Caused: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:109)
Caused: java.lang.Error
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:115)
    at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:185)
    at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
    at jenkins.model.Jenkins$5.runTask(Jenkins.java:1164)
    at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221)
    at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120)
    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused: org.jvnet.hudson.reactor.ReactorException
    at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:290)
    at jenkins.InitReactorRunner.run(InitReactorRunner.java:49)
    at jenkins.model.Jenkins.executeReactor(Jenkins.java:1199)
    at jenkins.model.Jenkins.<init>(Jenkins.java:987)
    at hudson.model.Hudson.<init>(Hudson.java:86)
    at hudson.model.Hudson.<init>(Hudson.java:82)
    at hudson.WebAppMain$3.run(WebAppMain.java:247)
Caused: hudson.util.HudsonFailedToLoad
    at hudson.WebAppMain$3.run(WebAppMain.java:264)
Udo10744 commented 9 months ago

Do you use some groovy code to access/modify resources?

Yes, here is our code:

      stage('aStageName')
      {
         steps
         {
            catchError(buildResult: 'UNSTABLE', stageResult: 'FAILURE', message: '[Test] failed.')
            {
               timeout(time: 20, unit: 'HOURS', activity: true)
               {
                  lock(resource: "test_${env.COMPUTERNAME}", extra: [[resource: "test_${env.NODE_NAME}"]])
                  {
<do the thing>
                  }
               }
            }
         }
      }
mPokornyETM commented 9 months ago

This is supproted way. There is nothing weird. I think I know why it might happen (the exception) , but currently I am sick. SO it may take a while

Udo10744 commented 9 months ago

This is supproted way. There is nothing weird. I think I know why it might happen (the exception) , but currently I am sick. SO it may take a while

Thanks for your feedback. Good to know that you have an idea how it might be fixed. I am looking forward to the next update of the plugin. Until then: Gute Besserung!

pvohmann commented 9 months ago

In our environment (with @ckullabosch ) we schedule ~hourly a horde of pipeline jobs .../(nodename)/setup-system-check who will run on built-in node. The jobs queue up on a single ephemeral resource. If Jenkins is restarted at the same time, i.e. set the checkmark in plugin updates, the exception occurs. The jobs are marked Do not allow the pipeline to resume if the controller restarts.

20240214-FreeDeadJobs.log has the recorder log.

To get a FINE log from Log Recorder, the plugin was modified to catch and log the exception. Patch based on version 1232: 20240214-FreeDeadJobs-diff.txt

I hope this info supports your assumption. Please take your time to get well.

mPokornyETM commented 9 months ago

The stack trace looks very helpful. That was also my idea to check. How it is in case of ephemeral resource and restart. I will try to create some testscript to simulate the scenario. @pvohmann thx for input

mPokornyETM commented 9 months ago

I think I catch it.

@pvohmann or @ckullabosch , how can I simulate this "dead" jobs like in ouyr logs. Which https://www.jenkins.io/doc/book/pipeline/pipeline-best-practices/#pipeline-durability are used in your environment?

I need to create some script to simulate the scenario (test script) or can somebody confirm that the changes in #629 are fine

thx

Udo10744 commented 9 months ago

I attached the log from our todays occurrence, maybe it also helps to drill down the issue...

2024-02-19-Startup.log

mPokornyETM commented 9 months ago

@Udo10744 "This is expected to happen when using the PERFORMANCE_OPTIMIZED durability" is this your case?

Udo10744 commented 9 months ago

@mPokornyETM yes, but not as Server Default. It is activated on project/pipeline level for most of our pipelines.

But, regarding the WARNING for "[Owner[project-e/Full_build/929:project-e/Full_build #929]]" in the logfile above: PERFORMANCE_OPTIMIZED durability was NOT activated, it was running on "MAX_SURVIVABILITY"!

Currently I am investigating about "Jenkins is not shut down cleanly" log entry from my log above. We are using Jenkins as a Windows Service. And I have currently doubts if stopping the Service (or reboot the Server) is treated in Jenkins as a "clear shutdown" or not. Maybe this side effects this issue in my case, because we have had the issue in the past only after a server reboot.

mPokornyETM commented 9 months ago

I only need to know, how can be easily reproduce the feailure, that the build is missing Or we hope, that the changes here fix it and we does not provide autometd tests. https://github.com/jenkinsci/lockable-resources-plugin/pull/629

Udo10744 commented 9 months ago

I only need to know, how can be easily reproduce the feailure, that the build is missing Or we hope, that the changes here fix it and we does not provide autometd tests. #629

This is difficult for me because I am "only" IT-Admin who keeps Jenkins up to date / up an running. But I do not configure Jenkins or setup Jobs/Pipelines, because that's all done by the software developer Team. I will check if I can support here (and get some support from the developers), but this would for sure take some time, because I have also many other tasks which I need to take care of....

From my point of view a new Version of the plugin (and some "hope" ;-) ) can make it only better, and not worth.... If you think the issue could have been fixed by your changes, just go ahead and do not wait any longer.

Anyhow, thank you for your support!

By the way: The WinSW / Windows Service Wrapper is killing the Jenkins process by default (at least in our installation) if you simply stop/restart the Jenkins service. Same applies if you reboot the server. From jenkins.wrapper.log

2024-02-19 07:34:10,317 INFO  - Stopping Jenkins
2024-02-19 07:34:10,318 DEBUG - ProcessKill 1104
2024-02-19 07:34:10,394 INFO  - Found child process: 1096 Name: conhost.exe
2024-02-19 07:34:10,475 INFO  - Stopping process 1096
2024-02-19 07:34:10,478 INFO  - Send SIGINT 1096
2024-02-19 07:34:10,479 WARN  - SIGINT to 1096 failed - Killing as fallback
2024-02-19 07:34:10,480 INFO  - Stopping process 1104
2024-02-19 07:34:10,481 INFO  - Send SIGINT 1104
2024-02-19 07:34:10,482 WARN  - SIGINT to 1104 failed - Killing as fallback
2024-02-19 07:34:10,484 INFO  - Finished Jenkins
2024-02-19 07:34:10,485 DEBUG - Completed. Exit code is 0
mPokornyETM commented 9 months ago

Will be a ption to install the plugin manually from https://ci.jenkins.io/job/Plugins/job/lockable-resources-plugin/job/PR-629/lastSuccessfulBuild/artifact/ ?? That will be enough for me to confirm, that the changes are fine.

Udo10744 commented 9 months ago

Will be a ption to install the plugin manually from https://ci.jenkins.io/job/Plugins/job/lockable-resources-plugin/job/PR-629/lastSuccessfulBuild/artifact/ ?? That will be enough for me to confirm, that the changes are fine.

Thanks! Generally yes, but in our productive environment the issue happens often, but not always. So it would not be reliable to "test" in our productive environment. Furthermore the restart of our Server is fix schedule on Monday 5:00 and can not be changed. So we always would have to wait until every Monday 5:00 to have (eventually unreliable) information...

But we also have a test environment on a separate server, I will check if I can make the issue reproducible on our our test environment. If yes, I will install the plugin manually there ...

I will report the progress / feedback here.

pvohmann commented 9 months ago

I could verify successful that plugin from PR-629 #2 solves the startup problem. Here's the component log:

Feb 19, 2024 6:07:23 PM INFO org.jenkins.plugins.lockableresources.NodesMirror createNodeResources
lockable-resources-plugin: configure node resources
Feb 19, 2024 6:07:23 PM FINE org.jenkins.plugins.lockableresources.BackwardCompatibility compatibilityMigration
lockable-resources-plugin compatibility migration task run for 2 resources
Feb 19, 2024 6:07:23 PM FINE org.jenkins.plugins.lockableresources.FreeDeadJobs freePostMortemResources
lockable-resources-plugin free post mortem task run
Feb 19, 2024 6:07:23 PM INFO org.jenkins.plugins.lockableresources.FreeDeadJobs freePostMortemResources
lockable-resources-plugin reset resource master-/var/jenkins/SHARED due post mortem job: lockableresources_build_queue #7
Feb 19, 2024 6:07:23 PM FINE org.jenkins.plugins.lockableresources.LockableResourcesManager freeResources
free it: [master-/var/jenkins/SHARED]
Feb 19, 2024 6:07:23 PM FINE org.jenkins.plugins.lockableresources.LockableResourcesManager freeResources
Remove ephemeral resource: master-/var/jenkins/SHARED
Feb 19, 2024 6:07:23 PM FINEST org.jenkins.plugins.lockableresources.LockableResourcesManager proceedNextContext
inversePrecedence: false
Feb 19, 2024 6:07:23 PM FINE org.jenkins.plugins.lockableresources.LockableResourcesManager getNextQueuedContext
current queue size: 3
Feb 19, 2024 6:07:23 PM WARNING org.jenkins.plugins.lockableresources.queue.QueuedContextStruct getBuild
Cannot get the build object from the context to proceed with lock. The build probably does not exists (deleted?)
java.io.IOException: Cannot resume build -- was not cleanly saved when Jenkins shut down.
    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:831)
    at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:719)
    at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:577)
    ...

Also, the issue can be reproduced with a small pipeline job on Ubuntu 20.04 jenkins.service. Find config.xml attached. Start the main job with param FORK_COUNT=4, it will start 4 forks; then restart Jenkins while lock requests are queued. lockableresources_build_queue.zip