jenkinsci / opentelemetry-plugin

Monitor and observe Jenkins with OpenTelemetry.
https://plugins.jenkins.io/opentelemetry/
Apache License 2.0
97 stars 49 forks source link

Opentelemtry Jenkins metrics `ci.pipeline.*` are missing #930

Closed mrh666 closed 1 week ago

mrh666 commented 2 weeks ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.462.1 OS: Linux - 6.1.85+ Java: 17.0.12 - Eclipse Adoptium (OpenJDK 64-Bit Server VM) --- Parameterized-Remote-Trigger:3.2.0 active-directory:2.36 analysis-model-api:12.4.0 ansicolor:1.0.4 antisamy-markup-formatter:162.v0e6ec0fcfcf6 apache-httpcomponents-client-4-api:4.5.14-208.v438351942757 apache-httpcomponents-client-5-api:5.3.1-110.v77252fb_d4da_5 asm-api:9.7-33.v4d23ef79fcc8 authentication-tokens:1.119.v50285141b_7e1 badge:1.13 basic-branch-build-strategies:81.v05e333931c7d blackduck-detect:9.0.0 blueocean:1.27.14 blueocean-bitbucket-pipeline:1.27.14 blueocean-commons:1.27.14 blueocean-config:1.27.14 blueocean-core-js:1.27.14 blueocean-dashboard:1.27.14 blueocean-display-url:2.4.3 blueocean-events:1.27.14 blueocean-git-pipeline:1.27.14 blueocean-github-pipeline:1.27.14 blueocean-i18n:1.27.14 blueocean-jwt:1.27.14 blueocean-personalization:1.27.14 blueocean-pipeline-api-impl:1.27.14 blueocean-pipeline-editor:1.27.14 blueocean-pipeline-scm-api:1.27.14 blueocean-rest:1.27.14 blueocean-rest-impl:1.27.14 blueocean-web:1.27.14 bootstrap5-api:5.3.3-1 bouncycastle-api:2.30.1.78.1-248.ve27176eb_46cb_ branch-api:2.1178.v969d9eb_c728e built-on-column:1.4 caffeine-api:3.1.8-133.v17b_1ff2e0599 checkmarx:2024.2.3 checks-api:2.2.0 cloudbees-bitbucket-branch-source:888.v8e6d479a_1730 cloudbees-folder:6.942.vb_43318a_156b_2 cobertura:1.17 code-coverage-api:4.99.0 command-launcher:107.v773860566e2e commons-compress-api:1.26.1-2 commons-lang3-api:3.16.0-82.ve2b_07d659d95 commons-text-api:1.12.0-129.v99a_50df237f7 conditional-buildstep:1.4.3 config-file-provider:973.vb_a_80ecb_9a_4d0 configuration-as-code:1836.vccda_4a_122a_a_e configuration-as-code-groovy:1.1 copyartifact:749.vfb_dca_a_9b_6549 coverage:1.16.1 credentials:1371.vfee6b_095f0a_3 credentials-binding:681.vf91669a_32e45 customizable-header:124.vff1b_7602cc5a_ dashboard-view:2.517.v776a_b_811a_b_4e data-tables-api:2.1.4-1 dependency-check-jenkins-plugin:5.5.1 disk-usage:1.2 display-url-api:2.204.vf6fddd8a_8b_e9 docker-commons:443.v921729d5611d docker-workflow:580.vc0c340686b_54 durable-task:568.v8fb_5c57e8417 echarts-api:5.5.1-1 eddsa-api:0.3.0-4.v84c6f0f4969e email-ext:1814.v404722f34263 embeddable-build-status:487.va_0ef04c898a_2 envinject:2.908.v66a_774b_31d93 envinject-api:1.199.v3ce31253ed13 extended-read-permission:53.v6499940139e5 favorite:2.221.v19ca_666b_62f5 flatpickr-api:4.6.13-5.v534d8025a_a_59 font-awesome-api:6.6.0-1 forensics-api:2.5.0 git:5.3.0 git-client:5.0.0 git-server:126.v0d945d8d2b_39 github:1.40.0 github-api:1.321-468.v6a_9f5f2d5a_7e github-branch-source:1797.v86fdb_4d57d43 github-oauth:597.ve0c3480fcb_d0 global-build-stats:307.v03dce5a_f8943 global-slack-notifier:1.5 groovy:457.v99900cb_85593 groovy-postbuild:228.vcdb_cf7265066 gson-api:2.11.0-41.v019fcf6125dc handy-uri-templates-2-api:2.1.8-30.v7e777411b_148 htmlpublisher:1.36 http_request:1.19 influxdb:3.6.1 instance-identity:185.v303dc7c645f9 ionicons-api:74.v93d5eb_813d5f jackson2-api:2.17.0-379.v02de8ec9f64c jacoco:3.3.6 jakarta-activation-api:2.1.3-1 jakarta-mail-api:2.1.3-1 javadoc:243.vb_b_503b_b_45537 javax-activation-api:1.2.0-7 javax-mail-api:1.6.2-10 jaxb:2.3.9-1 jdk-tool:73.vddf737284550 jenkins-design-language:1.27.14 jenkins-multijob-plugin:627.v7c23cef20a_6a jjwt-api:0.11.5-112.ve82dfb_224b_a_d job-dsl:1.87 joda-time-api:2.12.7-29.v5a_b_e3a_82269a_ jquery3-api:3.7.1-2 jsch:0.2.16-86.v42e010d9484b_ json-api:20240303-41.v94e11e6de726 json-path-api:2.9.0-58.v62e3e85b_a_655 junit:1291.v60776881903c kubernetes:4280.vd919fa_528c7e kubernetes-cli:1.12.1 kubernetes-client-api:6.10.0-240.v57880ce8b_0b_2 kubernetes-credentials:189.v90a_488b_d1d65 kubernetes-credentials-provider:1.262.v2670ef7ea_0c5 kubernetes-pipeline-devops-steps:1.6 lockable-resources:1255.vf48745da_35d0 mailer:472.vf7c289a_4b_420 mask-passwords:173.v6a_077a_291eb_5 matrix-auth:3.2.2 matrix-project:832.va_66e270d2946 maven-plugin:3.23 metrics:4.2.21-451.vd51df8df52ec mina-sshd-api-common:2.13.2-125.v200281b_61d59 mina-sshd-api-core:2.13.2-125.v200281b_61d59 multi-branch-project-plugin:0.7 okhttp-api:4.11.0-172.vda_da_1feeb_c6e opentelemetry:3.1320.v2eededb_d909e opentelemetry-api:1.40.0-24.v83ee9a_c6e8d9 parameterized-trigger:806.vf6fff3e28c3e performance:962.v95a_4913d332e pipeline-build-step:540.vb_e8849e1a_b_d8 pipeline-github:2.8-159.09e4403bc62f pipeline-github-lib:61.v629f2cc41d83 pipeline-githubnotify-step:49.vf37bf92d2bc8 pipeline-graph-analysis:216.vfd8b_ece330ca_ pipeline-groovy-lib:730.ve57b_34648c63 pipeline-input-step:495.ve9c153f6067b_ pipeline-maven:1421.v610fa_b_e2d60e pipeline-maven-api:1421.v610fa_b_e2d60e pipeline-milestone-step:119.vdfdc43fc3b_9a_ pipeline-model-api:2.2205.vc9522a_9d5711 pipeline-model-definition:2.2205.vc9522a_9d5711 pipeline-model-extensions:2.2205.vc9522a_9d5711 pipeline-multibranch-defaults:2.1 pipeline-stage-step:312.v8cd10304c27a_ pipeline-stage-tags-metadata:2.2205.vc9522a_9d5711 pipeline-utility-steps:2.17.0 plain-credentials:183.va_de8f1dd5a_2b_ plugin-util-api:4.1.0 prism-api:1.29.0-16 pubsub-light:1.18 rebuild:332.va_1ee476d8f6d resource-disposer:0.23 robot:3.5.2 run-condition:1.7 saferestart:0.7 saml:4.464.vea_cb_75d7f5e0 scm-api:696.v778d637b_a_762 script-security:1354.va_70a_fe478c7f sidebar-link:2.4.1 slack:734.v7f9ec8b_66975 snakeyaml-api:2.2-121.v5a_68b_9300b_d4 sonar:2.17.2 sse-gateway:1.27 ssh-agent:376.v8933585c69d3 ssh-credentials:343.v884f71d78167 ssh-slaves:2.973.v0fa_8c0dea_f9f sshd:3.330.vc866a_8389b_58 structs:338.v848422169819 timestamper:1.27 token-macro:400.v35420b_922dcb_ trilead-api:2.147.vb_73cc728a_32e variant:60.v7290fc0eb_b_cd warnings-ng:11.4.1 webhook-step:342.v620877effe14 workflow-aggregator:600.vb_57cdd26fdd7 workflow-api:1336.vee415d95c521 workflow-basic-steps:1058.vcb_fc1e3a_21a_9 workflow-cps:3943.v3519a_3260660 workflow-cps-global-lib:612.v55f2f80781ef workflow-cps-global-lib-http:2.48.0 workflow-durable-task-step:1364.v2fd76fb_6fd41 workflow-job:1436.vfa_244484591f workflow-multibranch:795.ve0cb_1f45ca_9a_ workflow-scm-step:427.v4ca_6512e7df1 workflow-step-api:678.v3ee58b_469476 workflow-support:920.v59f71ce16f04 ws-cleanup:0.46 ```

What Operating System are you using (both controller, and any agents involved in the problem)?

OS: Linux - 6.1.85+

Reproduction steps

Run opentelemetry plugin v.3.1320.v2eededb_d909e running with Jenkins 2.452.1. Opentelemetry successfully reporting jenkins. metrics and all traces, but no ci.pipeline. metrics being delivered ( as per https://github.com/jenkinsci/opentelemetry-plugin/blob/main/docs/monitoring-metrics.md#jenkins-health-metrics ). Deliver set up as to directly write to Dynatrace

Expected Results

To have ci.pipeline.* metrics delivered also

Actual Results

Only jenkins.* metrics are delivered

Anything else?

No response

Are you interested in contributing a fix?

No response

cyrille-leclerc commented 2 weeks ago

Please bump to the latest "OpenTelemetry Plugin" (v3.1368.vb_f1dcb_e6595c) and OpenTelemetry API Plugin (v 1.40.0-32.v65c59076e638).

You particularly need the fix

mrh666 commented 2 weeks ago

Hi @cyrille-leclerc, just installed latest OpenTelemetry API Plugin Version 1.40.0-32.v65c59076e638 and OpenTelemetry Plugin Version 3.1368.vb_f1dcb_e6595c

But unfortunately no ci.pipeline.* metrics were delivered.

It does though an amazing job delivering traces, including all attributes:

Sep 03, 2024 9:45:00 PM FINE io.jenkins.plugins.opentelemetry.job.action.AbstractMonitoringAction

Purge span='BUILD telemetry test pipe', spanId=dcda442350d60197, traceId=779808b5b564d59612bc232ee31bb37c: SpanAndScopes{span=SdkSpan{traceId=779808b5b564d59612bc232ee31bb37c, spanId=dcda442350d60197, parentSpanContext=ImmutableSpanContext{traceId=779808b5b564d59612bc232ee31bb37c, spanId=76a0da1487a820d0, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=true, valid=true}, name=BUILD telemetry test pipe, kind=SERVER, attributes=AttributesMap{data={ci.pipeline.name=telemetry test pipe, ci.pipeline.run.cause=[UserIdCause:MY_USER], ci.pipeline.run.committers=[], ci.pipeline.run.number=36, ci.pipeline.run.durationMillis=5763, type=job, ci.pipeline.id=telemetry test pipe, ci.pipeline.run.completed=true, ci.pipeline.run.url=https://JENKINS_URL/job/telemetry%20test%20pipe/36/, ci.pipeline.type=workflow, ci.pipeline.run.result=SUCCESS}, capacity=128, totalAddedValues=11}, status=ImmutableStatusData{statusCode=OK, description=SUCCESS}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1725389094623138763, endEpochNanos=1725389100406601700}, scopes=0, scopeStartThreadName='Executor #-1 for Built-In Node : executing telemetry test pipe #36'}

But I really missed those metrics. Should I change any setting after update or something? I'm configured log recorder in Jenkins with logger io.jenkins.plugins.opentelemetry and level ALL, I do not see any errors or exceptions.

cyrille-leclerc commented 2 weeks ago

Sorry for the inconvenience. Can you confirm you no longer see any of the ci.pipeline.run.* metrics?

I see a regression with ci.pipeline.run.active missing but the other ones are produced on my machine.

extract from the OTel Collector Prometheus Exporter showing the produced ci.pipeline.run.* metrics:

# HELP ci_pipeline_run_aborted_total Job aborted
# TYPE ci_pipeline_run_aborted_total counter
ci_pipeline_run_aborted_total{instance="be802d0a442bb237465405d618d15f47",job="jenkins/jenkins",service_version="2.462.1"} 1
# HELP ci_pipeline_run_completed_total Job completed
# TYPE ci_pipeline_run_completed_total counter
ci_pipeline_run_completed_total{instance="be802d0a442bb237465405d618d15f47",job="jenkins/jenkins",service_version="2.462.1"} 7
# HELP ci_pipeline_run_failed_total Job failed
# TYPE ci_pipeline_run_failed_total counter
ci_pipeline_run_failed_total{instance="be802d0a442bb237465405d618d15f47",job="jenkins/jenkins",service_version="2.462.1"} 2
# HELP ci_pipeline_run_launched_total Job launched
# TYPE ci_pipeline_run_launched_total counter
ci_pipeline_run_launched_total{instance="be802d0a442bb237465405d618d15f47",job="jenkins/jenkins",service_version="2.462.1"} 8
# HELP ci_pipeline_run_started_total Job started
# TYPE ci_pipeline_run_started_total counter
ci_pipeline_run_started_total{instance="be802d0a442bb237465405d618d15f47",job="jenkins/jenkins",service_version="2.462.1"} 8
# HELP ci_pipeline_run_success_total Job succeed
# TYPE ci_pipeline_run_success_total counter
ci_pipeline_run_success_total{instance="be802d0a442bb237465405d618d15f47",job="jenkins/jenkins",service_version="2.462.1"} 5
mrh666 commented 2 weeks ago

@cyrille-leclerc , yes, confirmed. There is no ci.pipeline.run.* metrics

cyrille-leclerc commented 2 weeks ago

I would like to verify what OTel metric is produced by the Jenkins Controller. One solution is to enable a PRometheus exporter in he Jenkins Controller, another is to enable a PRometheus Exporter on the OpenTelemetry Collector that receive the telemetry emitted by Jenkins if you have setup such an OpenTelemetry Collector.

I'll describe how to enable the Prometheus exporter on the Jenkins Controller:

mrh666 commented 2 weeks ago

@cyrille-leclerc metrics available! But you probably want to see this answer from DT

We have found the root cause why you can't see the ci.pipeline.* metrics.

We have enabled debug logging in our test collector and discover that ci.pipeline.* metrics have type which is not accepted by Dynatrace.

Metric #4
Descriptor:
     -> Name: ci.pipeline.run.success
     -> Description: Job succeed
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
StartTimestamp: 2024-09-05 06:56:25.0705977 +0000 UTC
Timestamp: 2024-09-05 07:53:25.0807196 +0000 UTC
Value: 12

or

Metric #17
Descriptor:
     -> Name: ci.pipeline.run.started
     -> Description: Job started
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
StartTimestamp: 2024-09-05 06:56:25.0705977 +0000 UTC
Timestamp: 2024-09-05 07:53:25.0807196 +0000 UTC
Value: 12

You can see that it is

-> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative

Which we don't accept in Dynatrace.

You need to switch to Delta aggregation temporality. This should be possible by setting otel.exporter.otlp.metrics.temporality.preference=DELTA in OpenTelemetry SDK configuration.

mrh666 commented 2 weeks ago

After I applied otel.exporter.otlp.metrics.temporality.preference=DELTA all the metrics delivered to DT! It's definitely should be mentioned in README.

One question - this metric ci.pipeline.run.started Is actually timestamp but in DT I see ci.pipeline.run.started.count. Is it possible to have timestamp value of ci.pipeline.run.started and completed in DT? (in order to have build duration metric)

cyrille-leclerc commented 1 week ago

Thanks, I'll resolve this issue as the root cause and solution were identified:

Can you please create a subsequent ticket, an enhancement request, for this question of "timestamp value" which by the way is a concept I'm not familiar with. FYI we are going to fix a bug in the "unit" definition for a bunch of metrics, it may fix your problems in Dynatrace.