jenkinsci / opentelemetry-plugin

Monitor and observe Jenkins with OpenTelemetry.
https://plugins.jenkins.io/opentelemetry/
Apache License 2.0
98 stars 51 forks source link

NFS Device or resource busy when `otel.logs.mirror_to_disk` configured #677

Closed chriscarpenter12 closed 1 year ago

chriscarpenter12 commented 1 year ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.401.1 OS: Linux - 4.18.0-305.93.1.el8_4.x86_64 Java: 11.0.19 - Red Hat, Inc. (OpenJDK 64-Bit Server VM) --- Office-365-Connector:4.20.0 ace-editor:1.1 ant:487.vd79d090d4ea_e antisamy-markup-formatter:2.7 apache-httpcomponents-client-4-api:4.5.14-150.v7a_b_9d17134a_5 authentication-tokens:1.53.v1c90fd9191a_b_ basic-branch-build-strategies:71.vc1421f89888e bitbucket:223.vd12f2bca5430 blueocean:1.27.4 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.27.4 blueocean-commons:1.27.4 blueocean-config:1.27.4 blueocean-core-js:1.27.4 blueocean-dashboard:1.27.4 blueocean-display-url:2.4.2 blueocean-events:1.27.4 blueocean-git-pipeline:1.27.4 blueocean-github-pipeline:1.27.4 blueocean-i18n:1.27.4 blueocean-jwt:1.27.4 blueocean-personalization:1.27.4 blueocean-pipeline-api-impl:1.27.4 blueocean-pipeline-editor:1.27.4 blueocean-pipeline-scm-api:1.27.4 blueocean-rest:1.27.4 blueocean-rest-impl:1.27.4 blueocean-web:1.27.4 bootstrap5-api:5.3.0-1 bouncycastle-api:2.28 branch-api:2.1105.v472604208c55 caffeine-api:3.1.6-115.vb_8b_b_328e59d8 checks-api:2.0.0 cloudbees-bitbucket-branch-source:820.v30b_e8c1e36f3 cloudbees-disk-usage-simple:182.v62ca_0c992a_f3 cloudbees-folder:6.815.v0dd5a_cb_40e0e command-launcher:100.v2f6722292ee8 commons-httpclient3-api:3.1-3 commons-lang3-api:3.12.0-36.vd97de6465d5b_ commons-text-api:1.10.0-36.vc008c8fcda_7b_ conditional-buildstep:1.4.2 config-file-provider:938.ve2b_8a_591c596 configuration-as-code:1647.ve39ca_b_829b_42 configuration-as-code-groovy:1.1 copyartifact:705.v5295cffec284 credentials:1254.vb_96f366e7b_a_d credentials-binding:604.vb_64480b_c56ca_ cucumber-reports:5.7.5 data-tables-api:1.13.4-1 dependency-check-jenkins-plugin:5.4.0 display-url-api:2.3.7 docker-commons:419.v8e3cd84ef49c durable-task:507.v050055d0cb_dd echarts-api:5.4.0-5 email-ext:2.99 favorite:2.4.2 font-awesome-api:6.4.0-1 generic-webhook-trigger:1.86.4 git:5.1.0 git-changelog:3.32 git-client:4.3.0 git-server:99.va_0826a_b_cdfa_d github:1.37.1 github-api:1.314-431.v78d72a_3fe4c3 github-branch-source:1728.v859147241f49 google-oauth-plugin:1.0.8 groovy:453.vcdb_a_c5c99890 handy-uri-templates-2-api:2.1.8-22.v77d5b_75e6953 htmlpublisher:1.31 http_request:1.18 ignore-committer-strategy:1.0.4 instance-identity:142.v04572ca_5b_265 ionicons-api:56.v1b_1c8c49374e jackson2-api:2.15.2-350.v0c2f3f8fc595 jakarta-activation-api:2.0.1-3 jakarta-mail-api:2.0.1-3 javadoc:233.vdc1a_ec702cff javax-activation-api:1.2.0-6 javax-mail-api:1.6.2-9 jaxb:2.3.8-1 jdk-tool:66.vd8fa_64ee91b_d jenkins-design-language:1.27.4 jersey2-api:2.39.1-2 jira:3.10 jjwt-api:0.11.5-77.v646c772fddb_0 job-dsl:1.84 jquery3-api:3.7.0-1 jsch:0.2.8-65.v052c39de79b_2 junit:1207.va_09d5100410f kubernetes:3937.vd7b_82db_e347b_ kubernetes-client-api:6.4.1-215.v2ed17097a_8e9 kubernetes-credentials:0.10.0 lockable-resources:1156.v5e9f897ece02 mailer:457.v3f72cb_e015e5 mapdb-api:1.0.9-28.vf251ce40855d mask-passwords:150.vf80d33113e80 matrix-auth:3.1.8 matrix-project:789.v57a_725b_63c79 maven-plugin:3.22 mercurial:1260.vdfb_723cdcc81 metrics:4.2.18-438.v0ede325a_4c68 mina-sshd-api-common:2.10.0-69.v28e3e36d18eb_ mina-sshd-api-core:2.10.0-69.v28e3e36d18eb_ momentjs:1.1.1 multibranch-action-triggers:1.8.6 multibranch-build-strategy-extension:1.0.10 oauth-credentials:0.645.ve666a_c332668 okhttp-api:4.11.0-145.vcb_8de402ef81 openshift-client:1.1.0.413.v3023d27e8434 openshift-login:1.1.0.227.v27e08dfb_1a_20 openshift-sync:1.1.0.790.v2051fca_5ed8d opentelemetry:2.14.0 pam-auth:1.10 parameterized-trigger:2.46 pipeline-build-step:491.v1fec530da_858 pipeline-graph-analysis:202.va_d268e64deb_3 pipeline-groovy-lib:656.va_a_ceeb_6ffb_f7 pipeline-input-step:468.va_5db_051498a_4 pipeline-milestone-step:111.v449306f708b_7 pipeline-model-api:2.2133.ve46a_6113dfc3 pipeline-model-definition:2.2133.ve46a_6113dfc3 pipeline-model-extensions:2.2133.ve46a_6113dfc3 pipeline-rest-api:2.32 pipeline-stage-step:305.ve96d0205c1c6 pipeline-stage-tags-metadata:2.2133.ve46a_6113dfc3 pipeline-stage-view:2.32 pipeline-utility-steps:2.16.0 plain-credentials:143.v1b_df8b_d3b_e48 plugin-util-api:3.3.0 prometheus:2.2.3 pubsub-light:1.17 remote-file:1.23 run-condition:1.5 scm-api:676.v886669a_199a_a_ script-security:1251.vfe552ed55f8d snakeyaml-api:1.33-95.va_b_a_e3e47b_fa_4 sonar:2.15 sse-gateway:1.26 ssh-credentials:305.v8f4381501156 ssh-slaves:2.877.v365f5eb_a_b_eec ssh-steps:2.0.65.vd26b_5b_9b_de4d sshd:3.303.vefc7119b_ec23 structs:324.va_f5d6774f3a_d subversion:2.17.2 token-macro:359.vb_cde11682e0c trilead-api:2.84.v72119de229b_7 uno-choice:2.6.5 variant:59.vf075fe829ccb workflow-aggregator:581.v0c46fa_697ffd workflow-api:1213.v646def1087f9 workflow-basic-steps:1017.vb_45b_302f0cea_ workflow-cps:3673.v5b_dd74276262 workflow-cps-global-lib:609.vd95673f149b_b workflow-durable-task-step:1247.v7f9dfea_b_4fd0 workflow-job:1308.v58d48a_763b_31 workflow-multibranch:746.v05814d19c001 workflow-remote-loader:1.6 workflow-scm-step:408.v7d5b_135a_b_d49 workflow-step-api:639.v6eca_cd8c04a_a_ workflow-support:839.v35e2736cfd5c ```

After configuring otel.logs.mirror_to_disk=true we started noticing Jenkins errors for Device or resource busy. Upon turning it off and restarting Jenkins the errors seems to have cleared up.

Suppressed: java.nio.file.FileSystemException: /var/lib/jenkins/jobs/partner/jobs/builds/branches/PR-803/builds/.6/.nfs000000007042c9260000454a: Device or resource busy

I found a related cloudbees article that mentions about propagating the closing of loggers, and thinking this may be related.

image

https://issues.jenkins.io/browse/JENKINS-28409

What Operating System are you using (both controller, and any agents involved in the problem)?

Rhel8 container

Reproduction steps

Configure otel.logs.mirror_to_disk and setup Jenkins job directory on and nfs mount

Expected Results

No error when builds are deleted

Actual Results

n/a

Anything else?

No response

chriscarpenter12 commented 1 year ago

Purely speculative, but it may be in the OtelLogOutputStream.java that should close the parent like the issue reported on the Jenkins bug.

cyrille-leclerc commented 1 year ago

@AndriiChuzhynov does it ring a bell to you?

cyrille-leclerc commented 1 year ago

I think I understand this one. I'll push a PR soon.

cyrille-leclerc commented 1 year ago

fix PR submitted

cyrille-leclerc commented 1 year ago

Fixed by:

cyrille-leclerc commented 1 year ago

Can you please test fix:

chriscarpenter12 commented 1 year ago

Can you please test fix:

Is there an HPI for this to download? I didn't see it here

cyrille-leclerc commented 1 year ago

please find:

chriscarpenter12 commented 1 year ago

After a few tests of removing branches from Jenkins I don't see any Device or resource busy errors with mirroring enabled. I do still see this issue, but I think it's unrelated. Maybe mirroring to disk is required to visualize through both elastic and Jenkins. Unfortunate that you lose the link to elastic when mirroring to disk though.