jenkinsci / prometheus-plugin

Jenkins Prometheus Plugin
https://plugins.jenkins.io/prometheus/
Apache License 2.0
181 stars 151 forks source link

Memory leak via BuildCompletionListener #643

Closed dkeitzel closed 5 months ago

dkeitzel commented 5 months ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.426.3 OS: Linux - 4.12.14-122.156-default Java: 17.0.10 - Oracle Corporation (OpenJDK 64-Bit Server VM) --- Parameterized-Remote-Trigger:3.2.0 analysis-model-api:11.15.0 ansicolor:1.0.4 ant:497.v94e7d9fffa_b_9 antisamy-markup-formatter:162.v0e6ec0fcfcf6 apache-httpcomponents-client-4-api:4.5.14-208.v438351942757 apache-httpcomponents-client-5-api:5.3.1-1.0 appsecstarter.plugin:1.0-alpha-14 authentication-tokens:1.53.v1c90fd9191a_b_ aws-credentials:218.v1b_e9466ec5da_ aws-java-sdk:1.12.633-430.vf9a_e567a_244f aws-java-sdk-cloudformation:1.12.633-430.vf9a_e567a_244f aws-java-sdk-codebuild:1.12.633-430.vf9a_e567a_244f aws-java-sdk-ec2:1.12.633-430.vf9a_e567a_244f aws-java-sdk-ecr:1.12.633-430.vf9a_e567a_244f aws-java-sdk-ecs:1.12.633-430.vf9a_e567a_244f aws-java-sdk-efs:1.12.633-430.vf9a_e567a_244f aws-java-sdk-elasticbeanstalk:1.12.633-430.vf9a_e567a_244f aws-java-sdk-iam:1.12.633-430.vf9a_e567a_244f aws-java-sdk-kinesis:1.12.633-430.vf9a_e567a_244f aws-java-sdk-logs:1.12.633-430.vf9a_e567a_244f aws-java-sdk-minimal:1.12.633-430.vf9a_e567a_244f aws-java-sdk-secretsmanager:1.12.633-430.vf9a_e567a_244f aws-java-sdk-sns:1.12.633-430.vf9a_e567a_244f aws-java-sdk-sqs:1.12.633-430.vf9a_e567a_244f aws-java-sdk-ssm:1.12.633-430.vf9a_e567a_244f basic-branch-build-strategies:81.v05e333931c7d blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.27.10 blueocean-commons:1.27.10 blueocean-config:1.27.10 blueocean-core-js:1.27.10 blueocean-dashboard:1.27.10 blueocean-display-url:2.4.2 blueocean-events:1.27.10 blueocean-git-pipeline:1.27.10 blueocean-i18n:1.27.10 blueocean-jwt:1.27.10 blueocean-personalization:1.27.10 blueocean-pipeline-api-impl:1.27.10 blueocean-pipeline-editor:1.27.10 blueocean-pipeline-scm-api:1.27.10 blueocean-rest:1.27.10 blueocean-rest-impl:1.27.10 blueocean-web:1.27.10 bootstrap4-api:4.6.0-6 bootstrap5-api:5.3.2-3 bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9 branch-api:2.1144.v1425d1c3d5a_7 build-discarder:139.v05696a_7fe240 build-environment:1.7 build-monitor-plugin:1.14-826.vb_a_c11536174d build-name-setter:2.4.1 build-symlink:1.1 build-timeout:1.32 build-token-root:151.va_e52fe3215fc build-user-vars-plugin:1.9 caffeine-api:3.1.8-133.v17b_1ff2e0599 checkmarx:2023.4.3 checks-api:2.0.2 cloud-stats:336.v788e4055508b_ cloudbees-bitbucket-branch-source:866.vdea_7dcd3008e cloudbees-folder:6.858.v898218f3609d clover:4.14.2.596.vb_4d6475e990b_ cobertura:1.17 code-coverage-api:4.99.0 command-launcher:107.v773860566e2e commons-lang3-api:3.13.0-62.v7d18e55f51e2 commons-text-api:1.11.0-95.v22a_d30ee5d36 config-file-provider:968.ve1ca_eb_913f8c copyartifact:722.v0662a_9b_e22a_c coverage:1.10.0 credentials:1319.v7eb_51b_3a_c97b_ credentials-binding:657.v2b_19db_7d6e6d crowd2:4.1.0 cucumber-reports:5.8.1 custom-tools-plugin:0.8 cvs:2.19.1 data-tables-api:1.13.8-2 display-url-api:2.200.vb_9327d658781 docker-commons:439.va_3cb_0a_6a_fb_29 docker-java-api:3.3.4-86.v39b_a_5ede342c docker-workflow:572.v950f58993843 dtkit-api:3.0.2 durable-task:547.vd1ea_007d100c echarts-api:5.4.3-2 email-ext:2.104 embeddable-build-status:467.v4a_954796e45d extended-choice-parameter:376.v2e02857547b_a_ extended-read-permission:53.v6499940139e5 external-monitor-job:215.v2e88e894db_f8 favorite:2.208.v91d65b_7792a_c flaky-test-handler:1.2.3 font-awesome-api:6.5.1-2 forensics-api:2.3.0 gatling:1.3.0 generic-webhook-trigger:2.0.0 git:5.2.1 git-client:4.6.0 git-parameter:0.9.19 git-server:114.v068a_c7cc2574 github:1.38.0 github-api:1.318-461.v7a_c09c9fa_d63 github-branch-source:1772.va_69eda_d018d4 golang:1.4 gradle:2.9 gson-api:2.10.1-15.v0d99f670e0a_7 h2-api:11.1.4.199-12.v9f4244395f7a_ handy-uri-templates-2-api:2.1.8-30.v7e777411b_148 htmlpublisher:1.32 http_request:1.18 ignore-committer-strategy:1.0.4 image-tag-parameter:2.0 instance-identity:185.v303dc7c645f9 ionicons-api:56.v1b_1c8c49374e jackson2-api:2.16.1-373.ve709c6871598 jacoco:3.3.5 jakarta-activation-api:2.0.1-3 jakarta-mail-api:2.0.1-3 javadoc:243.vb_b_503b_b_45537 javax-activation-api:1.2.0-6 javax-mail-api:1.6.2-9 jaxb:2.3.9-1 jdk-tool:73.vddf737284550 jenkins-design-language:1.27.10 jjwt-api:0.11.5-77.v646c772fddb_0 jnr-posix-api:3.1.18-1 jobConfigHistory:1229.v3039470161a_d joda-time-api:2.12.6-21.vca_fd74418fb_7 jquery:1.12.4-1 jquery3-api:3.7.1-1 jsch:0.2.16-86.v42e010d9484b_ json-path-api:2.9.0-33.v2527142f2e1d junit:1259.v65ffcef24a_88 junit-attachments:205.vc0677977deb_0 kubernetes:4186.v1d804571d5d4-spu kubernetes-client-api:6.10.0-240.v57880ce8b_0b_2 kubernetes-credentials:0.11 ldap:711.vb_d1a_491714dc lighthouse-report:1.3.0 lockable-resources:1232.v512d6c434eb_d mailer:463.vedf8358e006b_ mapdb-api:1.0.9-28.vf251ce40855d mask-passwords:173.v6a_077a_291eb_5 matrix-auth:3.2.1 matrix-project:822.824.v14451b_c0fd42 maven-plugin:3.23 mercurial:1260.vdfb_723cdcc81 metrics:4.2.21-449.v6960d7c54c69 mina-sshd-api-common:2.12.0-90.v9f7fb_9fa_3d3b_ mina-sshd-api-core:2.12.0-90.v9f7fb_9fa_3d3b_ monitoring:1.95.0 multibranch-build-strategy-extension:48.v3dc306525d0c nexus-jenkins-plugin:3.18.0-03 nodejs:1.6.1 nodelabelparameter:1.12.0 okhttp-api:4.11.0-172.vda_da_1feeb_c6e pam-auth:1.10 parameterized-scheduler:255.v73827fcdf618 performance-signature-dynatracesaas:3.2.2 performance-signature-ui:3.2.2 pipeline-aws:1.43 pipeline-build-step:540.vb_e8849e1a_b_d8 pipeline-graph-analysis:202.va_d268e64deb_3 pipeline-groovy-lib:704.vc58b_8890a_384 pipeline-input-step:477.v339683a_8d55e pipeline-maven:1376.v18876d10ce9c pipeline-maven-api:1376.v18876d10ce9c pipeline-maven-database:1376.v18876d10ce9c pipeline-milestone-step:111.v449306f708b_7 pipeline-model-api:2.2175.v76a_fff0a_2618 pipeline-model-definition:2.2175.v76a_fff0a_2618 pipeline-model-extensions:2.2175.v76a_fff0a_2618 pipeline-rest-api:2.34 pipeline-stage-step:305.ve96d0205c1c6 pipeline-stage-tags-metadata:2.2175.v76a_fff0a_2618 pipeline-stage-view:2.34 pipeline-utility-steps:2.16.1 plain-credentials:143.v1b_df8b_d3b_e48 plugin-util-api:3.8.0 popper-api:1.16.1-3 postgresql-api:42.6.0-31.vb_7e76dc13969 preSCMbuildstep:71.v1f2990a_37e27 prism-api:1.29.0-10 prometheus:2.5.1 pubsub-light:1.18 rebuild:330.v645b_7df10e2a_ rocketchatnotifier:1.5.2 role-strategy:689.v731678c3e0eb_ scm-api:683.vb_16722fb_b_80b_ script-security:1321.va_73c0795b_923 sidebar-link:2.4.1 simple-theme-plugin:176.v39740c03a_a_f5 snakeyaml-api:2.2-111.vc6598e30cc65 sonar:2.17.1 sse-gateway:1.26 ssh-agent:346.vda_a_c4f2c8e50 ssh-credentials:308.ve4497b_ccd8f4 ssh-slaves:2.948.vb_8050d697fec ssh-steps:2.0.68.va_d21a_12a_6476 sshd:3.322.v159e91f6a_550 stashNotifier:1.464.va_9203f84a_417 structs:337.v1b_04ea_4df7c8 subversion:2.17.3 swarm:3.44 timestamper:1.26 token-macro:400.v35420b_922dcb_ trilead-api:2.133.vfb_8a_7b_9c5dd1 uno-choice:2.8.1 variant:60.v7290fc0eb_b_cd warnings-ng:10.7.0 workflow-aggregator:596.v8c21c963d92d workflow-api:1291.v51fd2a_625da_7 workflow-basic-steps:1042.ve7b_140c4a_e0c workflow-cps:3853.vb_a_490d892963 workflow-durable-task-step:1327.ve57634fb_09ce workflow-job:1385.vb_58b_86ea_fff1 workflow-multibranch:773.vc4fe1378f1d5 workflow-scm-step:415.v434365564324 workflow-step-api:657.v03b_e8115821b_ workflow-support:865.v43e78cc44e0d xunit:3.1.3 zap-evaluate-pipeline:1.0 ```

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller: SUSE Linux Enterprise Server 12 SP5 Agents: SUSE Linux Enterprise Server 15 SP5

Reproduction steps

Plugin configuration (note that I even disabled job metrics):

org.jenkinsci.plugins.prometheus.config.PrometheusConfiguration.xml ```xml prometheus default jenkins_job false 60 false false false false false false false false false false false false false ^kubernetes_cloud_.* default_jenkins_job_usage_bytes ```

Expected Results

The list of runs in runStack of BuildCompletionListener should not grow indefinitely.

Actual Results

The list of Runs in runStack of BuildCompletionListener grows indefinitely. It seems that runStack.clear() is never called. On our instance the runStack has more than 10k entries after a day of operation. Checked via:

import org.jenkinsci.plugins.prometheus.collectors.builds.BuildCompletionListener
import org.jenkinsci.plugins.prometheus.collectors.builds.BuildCompletionListener.CloseableIterator;

BuildCompletionListener listener = BuildCompletionListener.getInstance()
CloseableIterator<Run<?,?>> iterator = null;
try {
    iterator = listener.iterator();
    println iterator.size()
} finally {
    if (iterator != null) {
        iterator.close();
    }
}

Some visuals of our heap:

grafik

Tracked it down by analyzing a heap dump:

We hold thousands of CpsFlowExecution objects:

grafik

The path to root gc shows the BuildCompletionListener.

grafik

Size of the list runStack when the maximum heap size is almost exhausted:

grafik

Anything else?

You can plug the leak by unregistering BuildCompletionListener:

import org.jenkinsci.plugins.prometheus.collectors.builds.BuildCompletionListener
BuildCompletionListener listener = BuildCompletionListener.getInstance()
listener.unregister()

Are you interested in contributing a fix?

I'm not sure about the cause of this issue. So nothing to contribute yet.

dkeitzel commented 5 months ago

I think I found the issue. When ignoring all build metrics, the close()-Operation of BuildCompletionListener is never executed. Thus the runs in BuildCompletionListener just pile up and never get cleared.

The best way to tackle this would be to unregister the BuildCompletionListener if all build metrics are disabled. The BuildCompletionListener has no use if there are no metrics to collect in the first place. A less good approach would be to make sure the list of runs gets cleared, even if all build metrics are ignored.

dkeitzel commented 5 months ago

Wouldn't it be best to not register any Collector that is disabled?

Waschndolos commented 5 months ago

That came in through a PR from the community. But there's also another issue with that, I'll need to check.

Edit: But one thing to mention here. Thank you for the great preparation!

Waschndolos commented 5 months ago

@dkeitzel : I've created a fix in https://github.com/jenkinsci/prometheus-plugin/pull/645. Seems to work this way.

cybe commented 5 months ago

Thank you @Waschndolos, I'll take a look. @dkeitzel is a different account of mine.

cybe commented 5 months ago

Thank you! 😊

Waschndolos commented 5 months ago

I'll check the version in our companies Jenkins on monday and probably release on monday. Thank you for the great preparation