jenkinsci / opentelemetry-plugin

Monitor and observe Jenkins with OpenTelemetry.
https://plugins.jenkins.io/opentelemetry/
Apache License 2.0
98 stars 50 forks source link

Metrics export stopped after changing the configuration of the Jenkins OpenTelemetry Plugin #424

Closed timja closed 2 years ago

timja commented 2 years ago

WORKAROUND: ❗ ❗ RESTART JENKINS CONTROLLER ❗ ❗

Jenkins and plugins versions report

Environment ```text Jenkins: 2.345 OS: Linux - 5.4.0-1077-azure --- ace-editor:1.1 ansicolor:1.0.1 antisamy-markup-formatter:2.7 apache-httpcomponents-client-4-api:4.5.13-1.0 authentication-tokens:1.4 azure-ad:195.v8555a0bf0d22 azure-artifact-manager:97.v074e1332e88d azure-credentials:216.ve0b_4a_485ffc2 azure-keyvault:131.v867845ef6ae9 azure-sdk:106.v552de1e64d56 azure-vm-agents:810.v0a97a847315a basic-branch-build-strategies:1.3.2 blueocean:1.25.3 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.25.3 blueocean-commons:1.25.3 blueocean-config:1.25.3 blueocean-core-js:1.25.3 blueocean-dashboard:1.25.3 blueocean-display-url:2.4.1 blueocean-events:1.25.3 blueocean-git-pipeline:1.25.3 blueocean-github-pipeline:1.25.3 blueocean-i18n:1.25.3 blueocean-jwt:1.25.3 blueocean-personalization:1.25.3 blueocean-pipeline-api-impl:1.25.3 blueocean-pipeline-editor:1.25.3 blueocean-pipeline-scm-api:1.25.3 blueocean-rest:1.25.3 blueocean-rest-impl:1.25.3 blueocean-web:1.25.3 bootstrap4-api:4.6.0-4 bootstrap5-api:5.1.3-6 bouncycastle-api:2.26 branch-api:2.1046.v0ca_37783ecc5 build-monitor-plugin:1.13+build.202204241251 caffeine-api:2.9.3-65.v6a_47d0f4d1fe checks-api:1.7.3 cloud-stats:0.27 cloudbees-bitbucket-branch-source:765.v5a_2d6a_23c01d cloudbees-disk-usage-simple:0.10 cloudbees-folder:6.714.v79e858ef76a_2 command-launcher:1.6 configuration-as-code:1429.v09b_044a_c93de copyartifact:1.46.4 credentials:1126.ve05618c41e62 credentials-binding:523.vd859a_4b_122e6 dark-theme:171.v2540e8184da_0 display-url-api:2.3.6 docker-commons:1.19 docker-workflow:1.28 durable-task:496.va67c6f9eefa7 echarts-api:5.3.2-1 extended-read-permission:3.2 favorite:2.4.1 font-awesome-api:6.0.0-1 gatling:1.3.0 git:4.11.1 git-client:3.11.0 git-server:1.10 github:1.34.3 github-api:1.303-400.v35c2d8258028 github-branch-source:1598.v91207e9f9b_4a_ github-checks:1.0.18 github-scm-trait-notification-context:1.1 handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-22.v77d5b_75e6953 htmlpublisher:1.30 http_request:1.15 jackson2-api:2.13.2.20220328-273.v11d70a_b_a_1a_52 jacoco:3.3.1 javadoc:217.v905b_86277a_2a_ javax-activation-api:1.2.0-3 javax-mail-api:1.6.2-6 jaxb:2.3.0.1 jdk-tool:1.5 jenkins-design-language:1.25.3 jjwt-api:0.11.2-71.v2722b_b_06a_2a_f job-dsl:1.79 jquery3-api:3.6.0-3 jsch:0.1.55.2 junit:1.60 kubernetes:3580.v78271e5631dc kubernetes-client-api:5.12.1-187.v577c3e368fb_6 kubernetes-credentials:0.9.0 lockable-resources:2.14 mailer:414.vcc4c33714601 matrix-auth:3.1.1 matrix-project:758.v7a_ea_491852f3 maven-plugin:3.18 metrics:4.1.6.2 momentjs:1.1.1 monitoring:1.91.0 okhttp-api:4.9.3-105.vb96869f8ac3a opentelemetry:2.5.0 pipeline-build-step:2.18 pipeline-github-lib:36.v4c01db_ca_ed16 pipeline-graph-analysis:195.v5812d95a_a_2f9 pipeline-graph-view:51.v5a693b766483 pipeline-input-step:448.v37cea_9a_10a_70 pipeline-milestone-step:101.vd572fef9d926 pipeline-model-api:2.2077.vc78ec45162f1 pipeline-model-definition:2.2077.vc78ec45162f1 pipeline-model-extensions:2.2077.vc78ec45162f1 pipeline-rest-api:2.24 pipeline-stage-step:293.v200037eefcd5 pipeline-stage-tags-metadata:2.2077.vc78ec45162f1 pipeline-stage-view:2.24 pipeline-utility-steps:2.12.0 plain-credentials:1.8 plugin-util-api:2.16.0 popper-api:1.16.1-3 popper2-api:2.11.5-1 prometheus:2.0.11 pubsub-light:1.16 run-condition:1.5 saml:2.296.v0016349946db_ sauce-ondemand:1.199 scm-api:608.vfa_f971c5a_a_e9 script-security:1158.v7c1b_73a_69a_08 slack:608.v19e3b_44b_b_9ff snakeyaml-api:1.30.1 sonar:2.14 sse-gateway:1.25 ssh-credentials:277.v95c2fec1c047 sshd:3.228.v4c9f9e652c86 structs:318.va_f3ccb_729b_71 support-core:1162.vb_b_e5198c6b_22 theme-manager:1.2 timestamper:1.17 token-macro:293.v283932a_0a_b_49 trilead-api:1.57.v6e90e07157e1 variant:1.4 windows-azure-storage:373.v582b31a65906 workflow-aggregator:2.7 workflow-api:1144.v61c3180fa_03f workflow-basic-steps:948.v2c72a_091b_b_68 workflow-cps:2689.v434009a_31b_f1 workflow-cps-global-lib:570.v21311f4951f8 workflow-durable-task-step:1130.v8fd69d0b_8857 workflow-job:1180.v04c4e75dce43 workflow-multibranch:712.vc169a_1387405 workflow-scm-step:399.v9b_8f4da_65061 workflow-step-api:625.vd896b_f445a_f8 workflow-support:819.v37d707a_71d9b_ ```

What Operating System are you using (both controller, and any agents involved in the problem)?

jenkinsci/docker Ubuntu 20 agent

Reproduction steps

Expected Results

Metrics to be exported

Actual Results

Metrics have stopped being exported. Confirmed by changing otel collector log level: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#logs

and not seeing the metric arriving

Anything else?

Traces are still arriving just fine

I haven't restarted it yet and would be interested to debug it if you have any suggestions

cyrille-leclerc commented 2 years ago

Sorry for the inconvenience .

Can you please collect the "Noteworthy active configuration properties" in the Jenkins OpenTelemetry advanced config in "Managed Jenkins" (navigate to the "advanced" section of the Jenkins Otel Plugin.

image
timja commented 2 years ago
otel.exporter.otlp.endpoint=http://opentelemetry-collector.monitoring:4317
otel.instrumentation.jenkins.web.enabled=false
otel.metrics.exporter=otlp
otel.traces.exporter=otlp
container.id=933c754a7ad79782d60745f0f55cb852317c3e08d4231fb88395eb37fa68476a
host.arch=amd64
host.name=jenkins-0
jenkins.opentelemetry.plugin.version=2.5.0
jenkins.url=https://build.platform.hmcts.net/
jenkins.version=2.345
os.description=Linux 5.4.0-1077-azure
os.type=linux
process.runtime.description=Eclipse Adoptium OpenJDK 64-Bit Server VM 11.0.14.1+1
process.runtime.name=OpenJDK Runtime Environment
process.runtime.version=11.0.14.1+1
service.name=jenkins
service.namespace=jenkins
service.version=2.345
telemetry.sdk.language=java
telemetry.sdk.name=opentelemetry
telemetry.sdk.version=1.13.0
cyrille-leclerc commented 2 years ago

Thanks, The exporter otel.metrics.exporter=otlp is defined as expected.

Do you see messages in the Jenkins logs that look like:

2022-05-03 14:24:08.062+0000 [id=35]    INFO    i.j.p.o.OpenTelemetrySdkProvider#initialize: OpenTelemetry SDK initialized: SDK [config: otel.traces.exporter=otlp, otel.metrics.exporter=otlp, otel.logs.exporter=otlp, otel.exporter.otlp.endpoint=http://localhost:4317, resource: service.name=jenkins, service.namespace=jenkins, service.version=2.332.2]

I'm wondering if the Otel SDK could have been reconfigured after these "4 days"

timja commented 2 years ago

Oo interesting,

That's the exact time it stopped working:

2022-05-02 12:47:12.171+0000 [id=864846]    INFO    i.j.p.o.OpenTelemetrySdkProvider#initialize: OpenTelemetry SDK initialized: SDK [config: otel.traces.exporter=otlp, otel.metrics.exporter=otlp, otel.exporter.otlp.endpoint=http://opentelemetry-collector.monitoring:4317, otel.instrumentation.jenkins.web.enabled=false, resource: service.name=jenkins, service.namespace=jenkins, service.version=2.345]

No log message appeared after I saved the extra ignore attribute you suggested to me

cyrille-leclerc commented 2 years ago

Interesting, so the metrics exporter wouldn't support the reconfiguration sequence we are using. I'll investigate

timja commented 2 years ago

A colleague confirmed he made a manual change to system config around that time by saving the configuration page.

(unrelated to opentelemetry, he was just changing environment variables in global node properties)

cyrille-leclerc commented 2 years ago

I understood the bug. Needs a code change to handle reconfiguration of metrics

timja commented 2 years ago

For completeness I restarted this morning and metrics are working again:

image

cyrille-leclerc commented 2 years ago

Can you please test https://github.com/jenkinsci/opentelemetry-plugin/releases/tag/opentelemetry-2.6.0-rc1 ? The plugin HPI file is attached to the release notes.