Closed stronk7 closed 1 year ago
Hi @stronk7, thanks for the bug report. I totally agree that error seems to come from this plugin. I will create a branch with a change which could solve this issue. Are you able to install a "pre-release" of the plugin on Jenkins to see if it solves the issue?
Hi @repolevedavaj,
never have done before, but can try. Although it can be some good time till we have the disconnection problems leading to the current bug. I mean, we don't have the problem daily but whenever, for any reason, the disconnections happen (sort of, when the cable is unplugged, heh).
Thanks for looking to this!
@stronk7 no worries, just let me know if it solves your issue :) You can download the plugin from here: https://ci.jenkins.io/job/Plugins/job/jobcacher-plugin/job/PR-207/1/artifact/org/jenkins-ci/plugins/jobcacher/388.v2c5fc2012a_89/jobcacher-388.v2c5fc2012a_89.hpi
Thanks @repolevedavaj ,
now we are using 388.v2c5fc2012a_89
here. Will keep an eye on all the jobs having disconnection problems and which their (hopefully passing) new outcome is.
Will report back as soon and we have some case, ciao :-)
Uhm... the 3 jobs that have finished since I updated the plugin have ended with an ugly java.lang.NullPointerException
at then end, without any stack trace. Just guessing if that can happen because the plugin was updated in the middle of their execution.
So I've launched a new (quick, just a few minutes) build to see if that null pointer exception is happening to all the builds, in which case, I'll have to revert to the upstream version. For the records, the new build is this, let's see how it ends:
Ciao :-)
Ok, so it seems that new builds (previous comment) are passing ok and only those that were already running when I upgraded the plugin have been caught in the middle.
so I’m going to keep the dev plugin installed, let’s see…
Ciao :)
Thanks for the update!
Hi @repolevedavaj,
it has been a long wait... but I think I come with good news.
We have had at very least a couple of builds where the Cannot contact workerXXX: java.lang.InterruptedException
happened in the middle of some long tests that ran ok no matter of the disconnection and then, the post-actions (including the JobCacher one) haven't caused any problem and the console is free from the reported above stack traces, with the job ending, as expected with an nice SUCCESS.
Here there are a couple of links to examples using the 388.v2c5fc2012a_89
version:
So, I'd say that your proposed changes really have fixed the reported problem, and now the JobCacher is immune to those potential disconnections happening in the middle. Great work!
Ciao :-)
Hi @stronk7 , thanks for the feedback! I merged the change (which will trigger the automatic release) :)
Jenkins and plugins versions report
Environment
```text Jenkins: 2.401.1 OS: Linux - 5.4.241-150.347.amzn2.x86_64 Java: 11.0.19 - Eclipse Adoptium (OpenJDK 64-Bit Server VM) --- all-changes:1.5 ansicolor:1.0.2 ant:487.vd79d090d4ea_e antisamy-markup-formatter:159.v25b_c67cd35fb_ apache-httpcomponents-client-4-api:4.5.14-150.v7a_b_9d17134a_5 authentication-tokens:1.53.v1c90fd9191a_b_ aws-credentials:191.vcb_f183ce58b_9 aws-java-sdk:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-cloudformation:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-codebuild:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-ec2:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-ecr:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-ecs:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-efs:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-elasticbeanstalk:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-iam:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-kinesis:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-logs:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-minimal:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-sns:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-sqs:1.12.481-392.v8b_291cfcda_09 aws-java-sdk-ssm:1.12.481-392.v8b_291cfcda_09 bootstrap5-api:5.3.0-1 bouncycastle-api:2.28 branch-api:2.1109.vdf225489a_16d build-failure-analyzer:2.4.1 build-timeout:1.31 build-user-vars-plugin:1.9 build-with-parameters:76.v9382db_f78962 caffeine-api:3.1.6-115.vb_8b_b_328e59d8 categorized-view:1.12 checks-api:2.0.0 cloudbees-folder:6.815.v0dd5a_cb_40e0e cobertura:1.17 code-coverage-api:4.7.0 collapsing-console-sections:1.8.0 command-launcher:100.v2f6722292ee8 commons-lang3-api:3.12.0-36.vd97de6465d5b_ commons-text-api:1.10.0-36.vc008c8fcda_7b_ conditional-buildstep:1.4.2 copyartifact:705.v5295cffec284 credentials:1254.vb_96f366e7b_a_d credentials-binding:604.vb_64480b_c56ca_ dashboard-view:2.487.vcf0ff9008a_c0 data-tables-api:1.13.4-2 display-upstream-changes:0.3.2 display-url-api:2.3.7 docker-build-publish:1.4.0 docker-commons:419.v8e3cd84ef49c docker-workflow:563.vd5d2e5c4007f durable-task:507.v050055d0cb_dd echarts-api:5.4.0-5 embeddable-build-status:385.vc95f94e91fb_3 extended-read-permission:53.v6499940139e5 extensible-choice-parameter:1.8.0 external-monitor-job:203.v683c09d993b_9 extra-columns:1.25 font-awesome-api:6.4.0-1 forensics-api:2.3.0 git:5.1.0 git-client:4.4.0 git-parameter:0.9.18 git-server:99.va_0826a_b_cdfa_d google-login:1.7 instance-identity:173.va_37c494ec4e5 ionicons-api:56.v1b_1c8c49374e jackson2-api:2.15.2-350.v0c2f3f8fc595 jakarta-activation-api:2.0.1-3 jakarta-mail-api:2.0.1-3 javadoc:233.vdc1a_ec702cff javax-activation-api:1.2.0-6 javax-mail-api:1.6.2-9 jaxb:2.3.8-1 jdk-tool:66.vd8fa_64ee91b_d jersey2-api:2.39.1-2 jira:3.10 jnr-posix-api:3.1.17-1 job-import-plugin:3.6 jobConfigHistory:1212.vd4470d08ff12 jobcacher:387.v41e893017398 jquery:1.12.4-1 jquery3-api:3.7.0-1 jsch:0.2.8-65.v052c39de79b_2 junit:1207.va_09d5100410f junit-attachments:167.vf1d139e316b_3 leastload:3.0.0 locale:314.v22ce953dfe9e lockable-resources:1166.v7ef984f72fe2 mailer:457.v3f72cb_e015e5 matrix-auth:3.1.8 matrix-project:789.v57a_725b_63c79 maven-plugin:3.22 metrics:4.2.18-439.v86a_20b_a_8318b_ mina-sshd-api-common:2.10.0-69.v28e3e36d18eb_ mina-sshd-api-core:2.10.0-69.v28e3e36d18eb_ monitoring:1.94.1 next-executions:179.vb_e011d7e3b_64 nodelabelparameter:1.12.0 notification:1.15 pam-auth:1.10 parameterized-trigger:2.45 pipeline-build-step:496.v2449a_9a_221f2 pipeline-github-lib:42.v0739460cda_c4 pipeline-graph-analysis:202.va_d268e64deb_3 pipeline-groovy-lib:656.va_a_ceeb_6ffb_f7 pipeline-input-step:468.va_5db_051498a_4 pipeline-milestone-step:111.v449306f708b_7 pipeline-model-api:2.2141.v5402e818a_779 pipeline-model-definition:2.2141.v5402e818a_779 pipeline-model-extensions:2.2141.v5402e818a_779 pipeline-rest-api:2.32 pipeline-stage-step:305.ve96d0205c1c6 pipeline-stage-tags-metadata:2.2141.v5402e818a_779 pipeline-stage-view:2.32 plain-credentials:143.v1b_df8b_d3b_e48 plugin-util-api:3.3.0 prism-api:1.29.0-7 rebuild:320.v5a_0933a_e7d61 resource-disposer:0.22 role-strategy:633.v836e5b_3e80a_5 run-condition:1.5 s3:0.12.3445.vda_704535b_5a_d scm-api:672.v64378a_b_20c60 script-security:1251.vfe552ed55f8d sectioned-view:1.25 snakeyaml-api:1.33-95.va_b_a_e3e47b_fa_4 ssh-credentials:305.v8f4381501156 sshd:3.303.vefc7119b_ec23 strict-crumb-issuer:2.1.1 structs:324.va_f5d6774f3a_d swarm:3.40 template-project:1.5.2 thinBackup:1.17 throttle-concurrents:2.13 timestamper:1.25 token-macro:359.vb_cde11682e0c trilead-api:2.84.v72119de229b_7 validating-string-parameter:2.8 variant:59.vf075fe829ccb view-job-filters:364.v48a_33389553d workflow-aggregator:596.v8c21c963d92d workflow-api:1215.v2b_ee3e1b_dd39 workflow-basic-steps:1017.vb_45b_302f0cea_ workflow-cps:3691.v28b_14c465a_b_b_ workflow-durable-task-step:1247.v7f9dfea_b_4fd0 workflow-job:1308.v58d48a_763b_31 workflow-multibranch:756.v891d88f2cd46 workflow-scm-step:415.v434365564324 workflow-step-api:639.v6eca_cd8c04a_a_ workflow-support:839.v35e2736cfd5c ws-cleanup:0.45 ws-ws-replacement:1.0.1 ```What Operating System are you using (both controller, and any agents involved in the problem)?
Controller: kubernetes pod running upstream Jenkins image with Debian 11. Agents: iron servers running Ubuntu 22.04.
Reproduction steps
Expected Results
The job should end ok and the caches be set normally, no matter the brief disconnection in the middle of the (long) tests run.
Actual Results
Each time (we have been monitoring it since some good time ago and correlation is 1:1) that there is a brief disconnection, the job fails with the information above, no matter the tests themselves have ended ok.
Anything else?
We are still trying some runs not using the plugin, or keeping that json file saved is some other way to be 100% sure if the problem happens only with the plugin (aka, we aren't 100% sure yet). But decided to report it already, because the stack trace really seems to point it it.
We have tried both with S3 and in-controller storage (to discard factors) and the same behaviour happens no matter of the storage configured.