jenkinsci / opentelemetry-plugin

Monitor and observe Jenkins with OpenTelemetry.
https://plugins.jenkins.io/opentelemetry/
Apache License 2.0
95 stars 49 forks source link

Some metrics are not populated in Elastic APM #437

Closed v1v closed 11 months ago

v1v commented 2 years ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.332.3 OS: Linux - 5.4.170+ --- ace-editor:1.1 ansicolor:1.0.1 antisamy-markup-formatter:2.7 apache-httpcomponents-client-4-api:4.5.13-1.0 artifact-manager-s3:633.v4813787e78a_9 authentication-tokens:1.4 authorize-project:1.4.0 aws-credentials:191.vcb_f183ce58b_9 aws-global-configuration:1.7 aws-java-sdk:1.12.215-339.vdc07efc5320c aws-java-sdk-cloudformation:1.12.215-339.vdc07efc5320c aws-java-sdk-codebuild:1.12.215-339.vdc07efc5320c aws-java-sdk-ec2:1.12.215-339.vdc07efc5320c aws-java-sdk-ecr:1.12.215-339.vdc07efc5320c aws-java-sdk-ecs:1.12.215-339.vdc07efc5320c aws-java-sdk-elasticbeanstalk:1.12.215-339.vdc07efc5320c aws-java-sdk-iam:1.12.215-339.vdc07efc5320c aws-java-sdk-logs:1.12.215-339.vdc07efc5320c aws-java-sdk-minimal:1.12.215-339.vdc07efc5320c aws-java-sdk-ssm:1.12.215-339.vdc07efc5320c blueocean:1.25.3 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.25.3 blueocean-commons:1.25.3 blueocean-config:1.25.3 blueocean-core-js:1.25.3 blueocean-dashboard:1.25.3 blueocean-display-url:2.4.1 blueocean-events:1.25.3 blueocean-git-pipeline:1.25.3 blueocean-github-pipeline:1.25.3 blueocean-i18n:1.25.3 blueocean-jwt:1.25.3 blueocean-personalization:1.25.3 blueocean-pipeline-api-impl:1.25.3 blueocean-pipeline-editor:1.25.3 blueocean-pipeline-scm-api:1.25.3 blueocean-rest:1.25.3 blueocean-rest-impl:1.25.3 blueocean-web:1.25.3 bootstrap4-api:4.6.0-5 bootstrap5-api:5.1.3-6 bouncycastle-api:2.26 branch-api:2.1046.v0ca_37783ecc5 caffeine-api:2.9.3-65.v6a_47d0f4d1fe checks-api:1.7.4 cloudbees-bitbucket-branch-source:765.v5a_2d6a_23c01d cloudbees-disk-usage-simple:0.10 cloudbees-folder:6.714.v79e858ef76a_2 command-launcher:81.v9c2cb_cb_db_392 configuration-as-code:1429.v09b_044a_c93de copyartifact:1.46.4 credentials:1087.1089.v2f1b_9a_b_040e4 credentials-binding:523.vd859a_4b_122e6 disable-github-multibranch-status:1.2 display-url-api:2.3.6 docker-commons:1.19 docker-workflow:1.28 durable-task:496.va67c6f9eefa7 echarts-api:5.3.2-1 extended-read-permission:3.2 favorite:2.4.1 font-awesome-api:6.0.0-1 git:4.11.1 git-client:3.11.0 git-server:1.11 github:1.34.3 github-api:1.303-400.v35c2d8258028 github-branch-source:1628.vb_2f51293cb_78 google-compute-engine:4.3.9 google-metadata-plugin:0.3.1 google-oauth-plugin:1.0.6 google-storage-plugin:1.5.6 handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-22.v77d5b_75e6953 hashicorp-vault-plugin:336.v182c0fbaaeb7 htmlpublisher:1.30 inline-pipeline:1.0.1 jackson2-api:2.13.2.20220328-281.v9ecc7a_5e834f javax-activation-api:1.2.0-3 javax-mail-api:1.6.2-6 jaxb:2.3.6-1 jdk-tool:1.5 jenkins-design-language:1.25.3 jjwt-api:0.11.2-71.v2722b_b_06a_2a_f job-dsl:1.79 jquery3-api:3.6.0-3 jsch:0.1.55.2 junit:1.63 junit-otel-reporter:0.1.0-SNAPSHOT (private-9453dcd5-mdelapenya) kubernetes:3580.v78271e5631dc kubernetes-client-api:5.12.1-187.v577c3e368fb_6 kubernetes-credentials:0.9.0 lockable-resources:2.15 mailer:414.vcc4c33714601 mask-passwords:3.1 matrix-auth:3.1.2 matrix-project:771.v574584b_39e60 metrics:4.1.6.2 mock-security-realm:1.6 momentjs:1.1.1 monitoring:1.91.0 oauth-credentials:0.5 okhttp-api:4.9.3-105.vb96869f8ac3a opentelemetry:2.6.0 pipeline-build-step:2.18 pipeline-github:2.8-138.d766e30bb08b pipeline-githubnotify-step:49.vf37bf92d2bc8 pipeline-graph-analysis:195.v5812d95a_a_2f9 pipeline-input-step:448.v37cea_9a_10a_70 pipeline-milestone-step:101.vd572fef9d926 pipeline-model-api:2.2081.v3919681ffc1e pipeline-model-definition:2.2081.v3919681ffc1e pipeline-model-extensions:2.2081.v3919681ffc1e pipeline-rest-api:2.24 pipeline-stage-step:293.v200037eefcd5 pipeline-stage-tags-metadata:2.2081.v3919681ffc1e pipeline-stage-view:2.24 pipeline-utility-steps:2.12.1 plain-credentials:1.8 plot:2.1.10 plugin-util-api:2.16.0 popper-api:1.16.1-3 popper2-api:2.11.5-1 pubsub-light:1.16 role-strategy:3.2.0 scm-api:608.vfa_f971c5a_a_e9 script-security:1158.v7c1b_73a_69a_08 slack:608.v19e3b_44b_b_9ff snakeyaml-api:1.30.1 sse-gateway:1.25 ssh-agent:295.v9ca_a_1c7cc3a_a_ ssh-credentials:277.v95c2fec1c047 ssh-slaves:1.814.vc82988f54b_10 sshd:3.228.v4c9f9e652c86 structs:318.va_f3ccb_729b_71 timestamper:1.17 token-macro:293.v283932a_0a_b_49 trilead-api:1.57.v6e90e07157e1 variant:1.4 workflow-aggregator:2.7 workflow-api:1153.vb_912c0e47fb_a_ workflow-basic-steps:948.v2c72a_091b_b_68 workflow-cps:2689.v434009a_31b_f1 workflow-cps-global-lib:581.ve633085a_8a_87 workflow-durable-task-step:1139.v252a_e12e8463 workflow-job:1180.v04c4e75dce43 workflow-multibranch:712.vc169a_1387405 workflow-scm-step:400.v6b_89a_1317c9a_ workflow-step-api:625.vd896b_f445a_f8 workflow-support:819.v37d707a_71d9b_ ```

What Operating System are you using (both controller, and any agents involved in the problem)?

Kubernetes

Reproduction steps

  1. OTEL plugin 2.5.1 reported metrics
  2. Updated all the plugins to the latest version
  3. Then some metrics are not reported for the jenkins.scm.events or queue

Expected Results

Metrics are reported

Actual Results

No metrics for the jenkinsi.queue

image

But metrics for the github.api rate limit

image

Anything else?

https://github.com/jenkinsci/opentelemetry-plugin/issues/424 was reported in the past

v1v commented 2 years ago

 With the 2.3.0 version

The first trace regarding the opentelemetry SDK

2022-05-17 20:18:04.195+0000 [id=41]    INFO    i.j.p.o.OpenTelemetrySdkProvider#initialize: OpenTelemetry SDK initialized: SDK [config: otel.traces.exporter=none, resource: service.name=jenkins, service.namespace=jenkins, service.version=2.332.3]

While, after saving the global settings there is a new entry in the logs and metrics start to be sent

2022-05-17 20:20:27.695+0000 [id=16]    INFO    i.j.p.o.OpenTelemetrySdkProvider#initialize: OpenTelemetry SDK initialized: SDK [config: otel.traces.exporter=otlp, otel.metrics.exporter=otlp, otel.logs.exporter=otlp, otel.exporter.otlp.endpoint=http://otel-collector:4317, resource: service.name=jenkins, service.namespace=jenkins, service.version=2.332.3]

With the 2.6.0 version

2022-05-17 20:27:11.478+0000 [id=44]    INFO    i.j.p.o.OpenTelemetrySdkProvider#initialize: OpenTelemetry SDK initialized: SDK [config: otel.traces.exporter=none, resource: service.name=jenkins, service.namespace=jenkins, service.version=2.332.3]

And if I save the global settings without changes:

2022-05-17 20:28:56.236+0000 [id=20]    INFO    i.j.p.o.OpenTelemetrySdkProvider#initialize: OpenTelemetry SDK initialized: SDK [config: otel.traces.exporter=otlp, otel.metrics.exporter=otlp, otel.logs.exporter=otlp, otel.exporter.otlp.endpoint=http://otel-collector:4317, resource: service.name=jenkins, service.namespace=jenkins, service.version=2.332.3]
cyrille-leclerc commented 2 years ago

Can you please test with https://github.com/jenkinsci/opentelemetry-plugin/releases/tag/opentelemetry-2.7.1-rc1

# HELP jenkins_queue_blocked Number of blocked tasks in the queue. Note that waiting for an executor to be available is not a reason to be counted as blocked
# TYPE jenkins_queue_blocked gauge
jenkins_queue_blocked{host_arch="x86_64",host_name="MacBook-Pro.localdomain",jenkins_opentelemetry_plugin_version="2.7.1-rc2-SNAPSHOT (private-f059e595-cyrilleleclerc)",jenkins_url="http://localhost:8080/jenkins/",jenkins_version="2.289.3",job="jenkins/jenkins",os_description="Mac OS X 11.4",os_type="darwin",process_runtime_description="AdoptOpenJDK OpenJDK 64-Bit Server VM 11.0.11+9",process_runtime_name="OpenJDK Runtime Environment",process_runtime_version="11.0.11+9",service_name="jenkins",service_namespace="jenkins",service_version="2.289.3",telemetry_sdk_language="java",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.14.0"} 0
# HELP jenkins_queue_buildable Number of tasks in the queue with the status 'buildable' or 'pending'
# TYPE jenkins_queue_buildable gauge
jenkins_queue_buildable{host_arch="x86_64",host_name="MacBook-Pro.localdomain",jenkins_opentelemetry_plugin_version="2.7.1-rc2-SNAPSHOT (private-f059e595-cyrilleleclerc)",jenkins_url="http://localhost:8080/jenkins/",jenkins_version="2.289.3",job="jenkins/jenkins",os_description="Mac OS X 11.4",os_type="darwin",process_runtime_description="AdoptOpenJDK OpenJDK 64-Bit Server VM 11.0.11+9",process_runtime_name="OpenJDK Runtime Environment",process_runtime_version="11.0.11+9",service_name="jenkins",service_namespace="jenkins",service_version="2.289.3",telemetry_sdk_language="java",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.14.0"} 1
# HELP jenkins_queue_left Total count of tasks that have been processed
# TYPE jenkins_queue_left counter
jenkins_queue_left{host_arch="x86_64",host_name="MacBook-Pro.localdomain",jenkins_opentelemetry_plugin_version="2.7.1-rc2-SNAPSHOT (private-f059e595-cyrilleleclerc)",jenkins_url="http://localhost:8080/jenkins/",jenkins_version="2.289.3",job="jenkins/jenkins",os_description="Mac OS X 11.4",os_type="darwin",process_runtime_description="AdoptOpenJDK OpenJDK 64-Bit Server VM 11.0.11+9",process_runtime_name="OpenJDK Runtime Environment",process_runtime_version="11.0.11+9",service_name="jenkins",service_namespace="jenkins",service_version="2.289.3",telemetry_sdk_language="java",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.14.0"} 1
# HELP jenkins_queue_time_spent_millis Total time spent in queue by the tasks that have been processed
# TYPE jenkins_queue_time_spent_millis counter
jenkins_queue_time_spent_millis{host_arch="x86_64",host_name="MacBook-Pro.localdomain",jenkins_opentelemetry_plugin_version="2.7.1-rc2-SNAPSHOT (private-f059e595-cyrilleleclerc)",jenkins_url="http://localhost:8080/jenkins/",jenkins_version="2.289.3",job="jenkins/jenkins",os_description="Mac OS X 11.4",os_type="darwin",process_runtime_description="AdoptOpenJDK OpenJDK 64-Bit Server VM 11.0.11+9",process_runtime_name="OpenJDK Runtime Environment",process_runtime_version="11.0.11+9",service_name="jenkins",service_namespace="jenkins",service_version="2.289.3",telemetry_sdk_language="java",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.14.0"} 316
# HELP jenkins_queue_waiting Number of tasks in the queue with the status 'waiting', 'buildable' or 'pending'
# TYPE jenkins_queue_waiting gauge
jenkins_queue_waiting{host_arch="x86_64",host_name="MacBook-Pro.localdomain",jenkins_opentelemetry_plugin_version="2.7.1-rc2-SNAPSHOT (private-f059e595-cyrilleleclerc)",jenkins_url="http://localhost:8080/jenkins/",jenkins_version="2.289.3",job="jenkins/jenkins",os_description="Mac OS X 11.4",os_type="darwin",process_runtime_description="AdoptOpenJDK OpenJDK 64-Bit Server VM 11.0.11+9",process_runtime_name="OpenJDK Runtime Environment",process_runtime_version="11.0.11+9",service_name="jenkins",service_namespace="jenkins",service_version="2.289.3",telemetry_sdk_language="java",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.14.0"} 1
v1v commented 2 years ago

I think those issues are related to the Elastic stack, I used the demo project from this repo and managed to see traces in Prometheus

image

For such, I disabled zypkin and used https://github.com/jenkinsci/opentelemetry-plugin/pull/446

Expand to view the diff

```diff diff --git a/demos/config/otel-collector-config.yaml b/demos/config/otel-collector-config.yaml index 3907a6a..82695a5 100644 --- a/demos/config/otel-collector-config.yaml +++ b/demos/config/otel-collector-config.yaml @@ -13,10 +13,6 @@ exporters: label1: value1 logging: - zipkin: - endpoint: "http://zipkin-all-in-one:9411/api/v2/spans" - format: proto - jaeger: endpoint: jaeger-all-in-one:14250 tls: @@ -43,7 +39,7 @@ service: traces: receivers: [otlp] processors: [batch] - exporters: [logging, zipkin, jaeger, otlp/elastic] + exporters: [logging, jaeger, otlp/elastic] metrics: receivers: [otlp] processors: [batch] diff --git a/demos/config/plugins.txt b/demos/config/plugins.txt index 25bf15d..cb8ab71 100644 --- a/demos/config/plugins.txt +++ b/demos/config/plugins.txt @@ -6,7 +6,7 @@ git filesystem_scm jdk-tool job-dsl -opentelemetry +opentelemetry::https://github.com/jenkinsci/opentelemetry-plugin/releases/download/opentelemetry-2.7.1-rc1/opentelemetry-2.7.1-rc1.hpi pipeline-model-definition swarm workflow-aggregator diff --git a/demos/docker-compose.yml b/demos/docker-compose.yml index 84f1446..9f61d12 100644 --- a/demos/docker-compose.yml +++ b/demos/docker-compose.yml @@ -64,13 +64,6 @@ services: networks: - jenkins - zipkin-all-in-one: - image: openzipkin/zipkin:latest - ports: - - "9411:9411" - networks: - - jenkins - prometheus: image: prom/prometheus:latest volumes: @@ -94,7 +87,6 @@ services: - "55670:55679" # zpages extension depends_on: - jaeger-all-in-one - - zipkin-all-in-one - fleet-server - prometheus networks: ```

Interestingly I can see some metrics in the Elastic Stack regarding other metrics such as the Java Collector

v1v commented 2 years ago

I enabled some logging in the OpenTelemetry collector and I can see some entries that are reflected in the ElasticStack:

Descriptor:
     -> Name: system.memory.usage
     -> Description: System memory usage
     -> Unit: bytes
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> state: STRING(free)
StartTimestamp: 2022-05-26 15:46:17.485345 +0000 UTC
Timestamp: 2022-05-26 16:12:16.382071 +0000 UTC
Value: 10379776000
NumberDataPoints #1
Data point attributes:
     -> state: STRING(used)
StartTimestamp: 2022-05-26 15:46:17.485345 +0000 UTC
Timestamp: 2022-05-26 16:12:16.382071 +0000 UTC
Value: 11669168128
Metric #18
Expand to view the metrics in Elastic

image

While the below metric is not found

Descriptor:
     -> Name: ci.pipeline.run.completed
     -> Description: Job completed
     -> Unit: 1
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
NumberDataPoints #0
StartTimestamp: 2022-05-26 15:46:17.485345 +0000 UTC
Timestamp: 2022-05-26 16:12:16.382071 +0000 UTC
Value: 3
Metric #17
Expand to view the metrics in Elastic

image

A similar could be related to the one I mentioned above, I can see jenkins.agents.total in Prometheus but I cannot see it in Elastic:

Descriptor:
     -> Name: jenkins.agents.total
     -> Description: Number of agents
     -> Unit: 1
     -> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 2022-05-26 15:46:17.485345 +0000 UTC
Timestamp: 2022-05-26 16:16:16.209923 +0000 UTC
Value: 2
Metric #4
Expand to view the metrics in Prometheus and Elastic

image image

cyrille-leclerc commented 2 years ago

@v1v is this bug fixed?

kuisathaverat commented 1 year ago

no, it is still there

Screenshot 2022-09-29 at 09 58 27 Screenshot 2022-09-29 at 09 59 08
chriscarpenter12 commented 1 year ago

Came here because I had the same issue as #536 trying to get info about the agents.

The field "jenkins.agents.total" associated with this object no longer exists in the index pattern. Please use another field.

image

chriscarpenter12 commented 1 year ago

After updating the plugin to the latest version of 2.13.0 I'm now seeing agents being reported correctly.

image

But I've noticed the Queue Left just keeps incrementing with nothing in my build queue in Jenkins

image

image

image

abalyan2395 commented 1 year ago

Facing the exact Same issue , I am using Jenkins version 2.346.3 and Plugin Version 2.11.0.

kuisathaverat commented 11 months ago

the issue is resolved at 2.13.0