jenkinsci / prometheus-plugin

Jenkins Prometheus Plugin
https://plugins.jenkins.io/prometheus/
Apache License 2.0
184 stars 152 forks source link

Telegraph error reading prometheus-plugin output. #250

Closed MrDecisive closed 3 years ago

MrDecisive commented 3 years ago

Hi,

We are seeing the following error reported in Telegraf when trying to read the output of the jenkinsci/prometheus-plugin.

telegraf[7915]: 2021-04-20T13:52:30Z E! [inputs.prometheus] Error in plugin: error reading metrics for http://\<host>/prometheus/: reading text format failed: text format parsing error in line 1394: second TYPE line for metric name "jenkins_node_builds", or TYPE reported after samples

Yesterday it seemed this error occured on the jenkins instance which had the recently released version prometheus:2.0.10 installed but I subsequently upgraded another (test) instance to the same version prometheus:2.0.10 and the problem was not reported...

I checked the outputs of the prometheus plugin at \<jenkins-url>/\<prometheus-metric-endpoint> and I found on the working version it was not outputting a "jenkins_node_builds" metric section at all. On the version that is failing it outputs multiple "jenkins_node_builds" sections but the first section is missing the #HELP and #TYPE lines. This fits with Telegraf's report as it first sees a metric section with no #HELP or #TYPE for "jenkins_node_builds" and then later on finds the #HELP and # TYPE lines for the next section so in it's eyes the header lines are coming after the section they should be attached to?? (I'm only guessing here)

I've included the Version report for both the working version and the broken version as I don't know where the "jenkins_node_builds" section is originating from and it could be the plugin outputting that metric might be the problem

Anyway hope this helps Cheers Phill

Version report

Broken Version

Result
Jenkins: 2.249.1
OS: Linux - 4.15.0-1113-azure
---
blueocean-pipeline-editor:1.23.2
github-branch-source:2.9.0
resource-disposer:0.15
blueocean-rest-impl:1.23.2
scm-api:2.6.4
blueocean-pipeline-scm-api:1.23.2
docker-workflow:1.24
jdk-tool:1.0
command-launcher:1.2
variant:1.3
blueocean-dashboard:1.23.2
workflow-step-api:2.23
blueocean-personalization:1.23.2
git-client:3.4.2
token-macro:2.13
blueocean-git-pipeline:1.23.2
pipeline-stage-step:2.5
blueocean-core-js:1.23.2
structs:1.22
htmlpublisher:1.23
mercurial:2.10
junit:1.49
jquery:1.12.4-1
bouncycastle-api:2.16.0
thinBackup:1.10
ssh-credentials:1.18.1
analysis-model-api:9.8.0
build-monitor-plugin:1.12+build.201809061734
matrix-auth:2.6.3
workflow-basic-steps:2.21
favorite:2.3.2
workflow-durable-task-step:2.36
snakeyaml-api:1.27.0
github:1.31.0
pipeline-utility-steps:2.7.1
blueocean-bitbucket-pipeline:1.23.2
workflow-cps:2.90
font-awesome-api:5.15.2-1
active-directory:2.23
blueocean-jwt:1.23.2
blueocean-web:1.23.2
docker-java-api:3.1.5.2
influxdb:3.0
ws-cleanup:0.39
checks-api:1.5.0
jenkins-design-language:1.23.2
pipeline-model-definition:1.7.2
bitbucket:1.1.27
credentials-binding:1.23
pipeline-model-api:1.7.2
blueocean-config:1.23.2
plugin-util-api:1.7.1
lockable-resources:2.10
cloudbees-bitbucket-branch-source:2.9.2
handlebars:3.0.8
prometheus:2.0.10
pipeline-rest-api:2.19
jaxb:2.3.0.1
azure-credentials:4.0.6
ssh-agent:1.22
ssh-slaves:1.31.5
jackson2-api:2.12.3
blueocean-i18n:1.23.2
credentials:2.3.18
mailer:1.34
pipeline-build-step:2.13
blueocean:1.23.2
cppcheck:1.25
blueocean-events:1.23.2
matrix-project:1.18
rebuild:1.32
git-server:1.9
build-timeout:1.20
cloud-stats:0.27
cloudbees-folder:6.14
workflow-support:3.8
jira:3.1.3
warnings-ng:8.9.2
blueocean-commons:1.23.2
jsch:0.1.55.2
build-pipeline-plugin:1.5.8
display-url-api:2.3.4
azure-acs:1.0.4
popper-api:1.16.1-1
workflow-multibranch:2.22
azure-commons:1.0.5
metrics:4.0.2.7
antisamy-markup-formatter:2.1
plain-credentials:1.7
conditional-buildstep:1.4.1
sse-gateway:1.23
apache-httpcomponents-client-4-api:4.5.13-1.0
git:4.4.2
blueocean-rest:1.23.2
pipeline-stage-tags-metadata:1.7.2
durable-task:1.35
pipeline-milestone-step:1.3.1
extended-read-permission:3.2
jquery-detached:1.2.1
docker-commons:1.17
forensics-api:0.10.1
jira-steps:1.6.0
kubernetes-cd:2.3.1
pipeline-graph-analysis:1.10
blueocean-autofavorite:1.2.4
github-api:1.116
trilead-api:1.0.13
blueocean-github-pipeline:1.23.2
email-ext:2.82
workflow-api:2.42
azure-app-service:1.0.2
maven-plugin:3.8
momentjs:1.1.1
blueocean-pipeline-api-impl:1.23.2
pipeline-model-extensions:1.7.2
timestamper:1.12
pubsub-light:1.13
copyartifact:1.46
script-security:1.76
parameterized-trigger:2.39
jquery3-api:3.5.1-2
workflow-cps-global-lib:2.17
data-tables-api:1.10.23-2
azure-vm-agents:1.5.1
ace-editor:1.1
ant:1.8
windows-slaves:1.0
external-monitor-job:1.4
ldap:1.11
pam-auth:1.5.1
violations:0.7.11
blueocean-jira:1.23.2
bootstrap4-api:4.6.0-1
okhttp-api:3.14.9
plot:2.1.9
authentication-tokens:1.4
windows-azure-storage:1.1.7
run-condition:1.5
pipeline-stage-view:2.19
workflow-scm-step:2.11
pipeline-input-step:2.12
javadoc:1.6
blueocean-display-url:2.4.0
echarts-api:4.9.0-3
handy-uri-templates-2-api:2.1.8-1.0
branch-api:2.6.2
workflow-job:2.40

Working Version

Result
Jenkins: 2.277.1
OS: Linux - 4.15.0-1108-azure
---
jackson2-api:2.12.3
cloudbees-folder:6.15
configuration-as-code:1.46
pipeline-stage-tags-metadata:1.7.2
ws-cleanup:0.38
resource-disposer:0.14
antisamy-markup-formatter:2.1
pipeline-github-lib:1.0
ant:1.11
mailer:1.32.1
ssh-agent:1.20
git-client:3.5.1
email-ext:2.79
workflow-cps:2.86
workflow-api:2.42
display-url-api:2.3.4
variant:1.3
workflow-cps-global-lib:2.17
github-api:1.116
credentials-binding:1.24
echarts-api:4.9.0-2
pipeline-input-step:2.12
extended-read-permission:3.2
jquery3-api:3.5.1-2
prometheus:2.0.10
github:1.32.0
workflow-durable-task-step:2.37
workflow-basic-steps:2.22
junit:1.44
snakeyaml-api:1.27.0
pipeline-stage-step:2.5
ace-editor:1.1
pipeline-model-definition:1.7.2
matrix-auth:2.6.5
command-launcher:1.5
structs:1.22
timestamper:1.11.8
trilead-api:1.0.12
git:4.4.5
pipeline-model-api:1.7.2
pipeline-stage-view:2.19
jdk-tool:1.4
pam-auth:1.6
bootstrap4-api:4.5.3-1
jacoco:3.1.0
azure-keyvault:2.1
workflow-aggregator:2.6
token-macro:2.12
ssh-credentials:1.18.1
workflow-support:3.8
bouncycastle-api:2.18
cloudbees-disk-usage-simple:0.10
workflow-step-api:2.23
lockable-resources:2.10
credentials:2.3.14
handlebars:1.1.1
script-security:1.76
pipeline-rest-api:2.19
momentjs:1.1.1
scm-api:2.6.4
workflow-scm-step:2.11
popper-api:1.16.0-7
durable-task:1.35
workflow-job:2.40
pipeline-milestone-step:1.3.1
azure-credentials:4.0.3
cobertura:1.16
workflow-multibranch:2.22
okhttp-api:3.14.9
matrix-project:1.18
gradle:1.36
plugin-util-api:1.4.0
code-coverage-api:1.2.0
git-server:1.9
branch-api:2.6.2
ldap:1.26
ssh-slaves:1.31.2
pipeline-graph-analysis:1.10
jquery-detached:1.2.1
github-branch-source:2.9.1
build-timeout:1.20
metrics:4.0.2.7
plain-credentials:1.7
jsch:0.1.55.2
font-awesome-api:5.15.1-1
checks-api:1.1.1
pipeline-model-extensions:1.7.2
pipeline-build-step:2.13
statistics-gatherer:2.0.3
apache-httpcomponents-client-4-api:4.5.10-2.0
Ubuntu 16.04.7 LTS (GNU/Linux 4.15.0-1113-azure x86_64)

Reproduction steps

Step 1 - Go to \<jenkins-url>/\<prometheus-metric-endpoint> Step 2 - Search for 'jenkins_node_builds' metric

Results

Expected result:

# HELP jenkins_node_builds Generated from Dropwizard metric import (metric=jenkins.node.builds, type=com.codahale.metrics.Timer)
# TYPE jenkins_node_builds summary
jenkins_node_builds{node="master",quantile="0.5",} 22.496242709
jenkins_node_builds{node="master",quantile="0.75",} 56.998456999000005
jenkins_node_builds{node="master",quantile="0.95",} 56.998456999000005
jenkins_node_builds{node="master",quantile="0.98",} 56.998456999000005
jenkins_node_builds{node="master",quantile="0.99",} 56.998456999000005
jenkins_node_builds{node="master",quantile="0.999",} 56.998456999000005
jenkins_node_builds_count{node="master"} 234.0

Actual result:

# TYPE vm_memory_pools_PS_Survivor_Space_committed_window_1h summary
vm_memory_pools_PS_Survivor_Space_committed_window_1h{quantile="0.5",} 7.2351744E7
vm_memory_pools_PS_Survivor_Space_committed_window_1h{quantile="0.75",} 1.24780544E8
vm_memory_pools_PS_Survivor_Space_committed_window_1h{quantile="0.95",} 1.74063616E8
vm_memory_pools_PS_Survivor_Space_committed_window_1h{quantile="0.98",} 1.91889408E8
vm_memory_pools_PS_Survivor_Space_committed_window_1h{quantile="0.99",} 1.92937984E8
vm_memory_pools_PS_Survivor_Space_committed_window_1h{quantile="0.999",} 1.92937984E8
vm_memory_pools_PS_Survivor_Space_committed_window_1h_count 5388.0
# HELP jenkins_node_docker_agent_11_builds Generated from Dropwizard metric import (metric=jenkins.node.docker-agent-11.builds, type=com.codahale.metrics.Timer)
# TYPE jenkins_node_docker_agent_11_builds summary
jenkins_node_builds{node="docker_agent_11",quantile="0.5",} 0.0
jenkins_node_builds{node="docker_agent_11",quantile="0.75",} 0.0
jenkins_node_builds{node="docker_agent_11",quantile="0.95",} 0.0
jenkins_node_builds{node="docker_agent_11",quantile="0.98",} 0.0
jenkins_node_builds{node="docker_agent_11",quantile="0.99",} 0.0
jenkins_node_builds{node="docker_agent_11",quantile="0.999",} 0.0
jenkins_node_builds_count{node="docker_agent_11"} 0.0
# HELP jenkins_plugins_failed Generated from Dropwizard metric import (metric=jenkins.plugins.failed, type=jenkins.metrics.impl.JenkinsMetricProviderImpl$22)
# TYPE jenkins_plugins_failed gauge
jenkins_plugins_failed 0.0
# HELP vm_count Generated from Dropwizard metric import (metric=vm.count, type=com.codahale.metrics.jvm.ThreadStatesGaugeSet$$Lambda$201/1006609823)
# TYPE vm_count gauge
vm_count 238.0
Waschndolos commented 3 years ago

We ran into the same issue. Our Telegraf agent now doesn't collect any metrics from Jenkins. Anyone knows if there's a workaround for this?

fatmcgav commented 3 years ago

Another :+1: for this issue... We use one-shot workers, so these metrics are of no use to us...

Would be great if had the option to disable them...

Waschndolos commented 3 years ago

Had a quick look on it but no time. Guess the fix is somewhere in org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics#collectMetrics. The DropwizardExports returns multiple metrics like jenkins.node..builds and for each one a # HELP and # TYPE is generated. As the org.jenkinsci.plugins.prometheus.util.MetricsFormatter#formatMetrics renames them then we get multiple "jenkins_node_builds" with {node=xxx} but the #HELP and #TYPE stay in the list.

Waschndolos commented 3 years ago

Is somebody working on this? I just tried to point to the issue but I have no knowledge how this should work. Would be nice to have a solution soon because our Jenkins monitoring is broken since then.

Sukiyakijango commented 3 years ago

still no fix or workaround for this issue :( keeeping us blinde to issue in our jenkins

markjacksonfishing commented 3 years ago

Please use latest version

Waschndolos commented 3 years ago

Correct me if I'm wrong but prometheus:2.0.10 is the latest version right? Which is the version we're all using.