jenkinsci / prometheus-plugin

Jenkins Prometheus Plugin
https://plugins.jenkins.io/prometheus/
Apache License 2.0
181 stars 151 forks source link

Prometheus Plugin Update prevents Jenkins from startup #647

Closed Alphayeeeet closed 5 months ago

Alphayeeeet commented 5 months ago

Jenkins and plugins versions report

Jenkins: 2.440.2 OS: Linux - 4.18.0-513.18.1.el8_9.x86_64 Java: 11.0.22 - Red Hat, Inc. (OpenJDK 64-Bit Server VM)

analysis-model-api:12.1.0 ansicolor:1.0.4 ant:497.v94e7d9fffa_b_9 antisamy-markup-formatter:162.v0e6ec0fcfcf6 apache-httpcomponents-client-4-api:4.5.14-208.v438351942757 asm-api:9.6-3.v2e1fa_b_338cd7 authentication-tokens:1.53.v1c90fd9191ab badge:1.9.1 basic-branch-build-strategies:81.v05e333931c7d bootstrap5-api:5.3.3-1 bouncycastle-api:2.30.1.77-225.v26ea_c9455fd9 branch-api:2.1152.v6f101e97dd77 build-timeout:1.32 build-user-vars-plugin:1.9 caffeine-api:3.1.8-133.v17b_1ff2e0599 checks-api:2.0.2 cloudbees-bitbucket-branch-source:880.vcf4056c5a_71f cloudbees-folder:6.901.vb_4c7a_da_75da_3 cobertura:1.17 code-coverage-api:4.99.0 command-launcher:107.v773860566e2e commons-lang3-api:3.13.0-62.v7d18e55f51e2 commons-text-api:1.11.0-95.v22a_d30ee5d36 conditional-buildstep:1.4.3 config-file-provider:968.ve1ca_eb_913f8c content-replace:1.8.2 copyartifact:722.v0662a_9b_e22a_c coverage:1.13.0 credentials:1337.v60b_d7b_c7b_c9f credentials-binding:657.v2b_19db_7d6e6d custom-tools-plugin:0.8 data-tables-api:2.0.2-1 dependency-track:4.3.1 display-url-api:2.200.vb_9327d658781 docker-commons:439.va_3cb_0a_6a_fb_29 docker-workflow:572.v950f58993843 durable-task:550.v0930093c4b_a_6 echarts-api:5.5.0-1 email-ext:2.105 exclusive-execution:0.8 extended-choice-parameter:381.v360a_25ea_017c extended-read-permission:53.v6499940139e5 extra-tool-installers:139.v723fee51b_7f2 favorite:2.208.v91d65b_7792a_c file-operations:214.v2e7dc7f25757 file-parameters:316.va_83a_1221db_a_7 font-awesome-api:6.5.1-3 forensics-api:2.4.0 generic-webhook-trigger:2.1.0 git:5.2.1 git-client:4.7.0 git-parameter:0.9.19 git-server:114.v068a_c7cc2574 gitlab-plugin:1.8.0 gradle:2.10 groovy:457.v99900cb_85593 groovy-postbuild:228.vcdb_cf7265066 gson-api:2.10.1-15.v0d99f670e0a_7 handy-uri-templates-2-api:2.1.8-30.v7e777411b_148 hidden-parameter:237.v4b_df26c7a_f0e htmlpublisher:1.33 http_request:1.18 instance-identity:185.v303dc7c645f9 ionicons-api:56.v1b_1c8c49374e jackson2-api:2.17.0-379.v02de8ec9f64c jakarta-activation-api:2.1.3-1 jakarta-mail-api:2.1.3-1 javadoc:243.vb_b_503b_b_45537 javax-activation-api:1.2.0-6 javax-mail-api:1.6.2-9 jaxb:2.3.9-1 jdk-tool:73.vddf737284550 jersey2-api:2.42-147.va_28a_44603b_d5 jjwt-api:0.11.5-77.v646c772fddb_0 job-dsl:1.87 joda-time-api:2.12.7-29.v5a_b_e3a82269a jquery:1.12.4-1 jquery3-api:3.7.1-2 jsch:0.2.16-86.v42e010d9484b_ json-api:20240303-41.v94e11e6de726 json-path-api:2.9.0-58.v62e3e85b_a_655 junit:1259.v65ffcef24a_88 ldap:719.vcb_d039b_77d0d mailer:472.vf7c289a_4b_420 mapdb-api:1.0.9-28.vf251ce40855d matrix-auth:3.2.2 matrix-project:822.824.v14451b_c0fd42 maven-plugin:3.23 metrics:4.2.21-449.v6960d7c54c69 mina-sshd-api-common:2.12.0-90.v9f7fb_9fa3d3b mina-sshd-api-core:2.12.0-90.v9f7fb_9fa3d3b msbuild:1.30 nodejs:1.6.1 okhttp-api:4.11.0-172.vda_da_1feeb_c6e openshift-client:1.1.0.424.v829cb_ccf8798 pam-auth:1.10 pipeline-build-step:540.vb_e8849e1a_b_d8 pipeline-graph-analysis:216.vfd8bece330ca pipeline-groovy-lib:704.vc58b_8890a_384 pipeline-input-step:491.vb_07d21da_1afb pipeline-milestone-step:111.v449306f708b_7 pipeline-model-api:2.2184.v0b_358b_953e69 pipeline-model-definition:2.2184.v0b_358b_953e69 pipeline-model-extensions:2.2184.v0b_358b_953e69 pipeline-rest-api:2.34 pipeline-stage-step:305.ve96d0205c1c6 pipeline-stage-tags-metadata:2.2184.v0b_358b_953e69 pipeline-stage-view:2.34 pipeline-utility-steps:2.16.2 plain-credentials:179.vc5cb_98f6db_38 plugin-util-api:4.1.0 powershell:2.1 prism-api:1.29.0-13 prometheus:2.5.1 --> Upgrade to 2.5.2 not possible needed to rollback to backup publish-over:0.22 publish-over-ssh:1.25 pubsub-light:1.18 purge-build-queue-plugin:88.v23b_97b_f2c7a_d purge-job-history:1.6 rebuild:330.v645b7df10e2a resource-disposer:0.23 role-strategy:713.vb_3837801b_8cc run-condition:1.7 scm-api:689.v237b_6d3a_ef7f script-security:1326.vdb_c154de8669 slave-setup:1.16 snakeyaml-api:2.2-111.vc6598e30cc65 sonar:2.17.2 sse-gateway:1.26 ssh-agent:346.vda_a_c4f2c8e50 ssh-credentials:326.v7fcb_aef6194b ssh-slaves:2.948.vb_8050d697fec sshd:3.322.v159e91f6a_550 strict-crumb-issuer:2.1.1 structs:337.v1b_04ea_4df7c8 throttle-concurrents:2.14 timestamper:1.26 token-macro:400.v35420b922dcb trilead-api:2.142.v748523a_76693 uno-choice:2.8.1 variant:60.v7290fc0eb_b_cd warnings-ng:11.2.2 workflow-aggregator:596.v8c21c963d92d workflow-api:1291.v51fd2a_625da_7 workflow-basic-steps:1049.v257a_e6b_30fb_d workflow-cps:3889.v937e0b_3412d3 workflow-durable-task-step:1331.vc8c2fed35334 workflow-job:1400.v7fd111b_ec82f workflow-multibranch:773.vc4fe1378f1d5 workflow-scm-step:427.v4ca_6512e7df1 workflow-step-api:657.v03be8115821b workflow-support:881.v7663695646cf ws-cleanup:0.45

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller Node is Running RHEL 8

Reproduction steps

Updating Prometheus Plugin to 2.5.2 Jenkins version is 2.440.2

Expected Results

Jenkins should startup normally.

Actual Results

Jenkins was stuck in startup process and throws errors in log.

Anything else?

Log Messages attached:

2024-03-26 14:08:26.920+0000 [id=43] WARNING o.j.p.prometheus.JobCollector#lambda$collect$0: Caught error when processing job [JOB] error: java.lang.IllegalStateException: Jenkins.instance is missing. Read the documentation of Jenkins.getInstanceOrNull to see what you are doing wrong. at jenkins.model.Jenkins.get(Jenkins.java:819) at org.jenkinsci.plugins.prometheus.config.PrometheusConfiguration.get(PrometheusConfiguration.java:79) at org.jenkinsci.plugins.prometheus.JobCollector.appendJobMetrics(JobCollector.java:215) at org.jenkinsci.plugins.prometheus.JobCollector.lambda$collect$0(JobCollector.java:162) at org.jenkinsci.plugins.prometheus.util.Jobs.forEachJob(Jobs.java:19) at org.jenkinsci.plugins.prometheus.JobCollector.collect(JobCollector.java:155) at io.prometheus.client.CollectorRegistry.collectorNames(CollectorRegistry.java:113) at io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:50) at org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics.registerCollectors(DefaultPrometheusMetrics.java:46) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:109) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:185) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305) at jenkins.model.Jenkins$5.runTask(Jenkins.java:1170) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) 2024-03-26 14:08:26.910+0000 [id=23] SEVERE h.i.i.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler#uncaughtException: A thread (Jenkins initialization thread/23) died unexpectedly due to an uncaught exception. This may leave your server corrupted and usually indicates a software bug. java.lang.ClassNotFoundException: hudson.util.HudsonFailedToLoad at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:511) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) Caused: java.lang.NoClassDefFoundError: hudson/util/HudsonFailedToLoad at hudson.WebAppMain$3.run(WebAppMain.java:276)

Are you interested in contributing a fix?

No response

Waschndolos commented 5 months ago

Sounds stange. I didn't have any issues in my systems (multiple platforms). I'll check if I can provide a quick fix

Waschndolos commented 5 months ago

I've just updated 2 of our production Jenkins instances to 2.5.2 in the company without any issues. Are you sure your Jenkins instance is healthy?

jonesbusy commented 5 months ago

Hi,

I also have issue with new version.

Caused by: java.lang.IllegalArgumentException: Failed to register Collector of type JenkinsStatusCollector: default_jenkins_version_info is already in use by another Collector of type JenkinsStatusCollector

Full stacktrace is

2024-03-27T05:54:02.816Z [ERROR] [Thread=pool-199-thread-3] - jenkins.InitReactorRunner # Failed DefaultPrometheusMetrics.registerCollectors 
java.lang.Error: java.lang.reflect.InvocationTargetException
        at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:115)
        at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:185)
        at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
        at jenkins.model.Jenkins$5.runTask(Jenkins.java:1170)
        at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221)
        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120)
        at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.reflect.InvocationTargetException: null
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:118)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:109)
        ... 9 common frames omitted
Caused by: java.lang.IllegalArgumentException: Failed to register Collector of type JenkinsStatusCollector: default_jenkins_version_info is already in use by another Collector of type JenkinsStatusCollector
        at io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:57)
        at org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics.registerCollectors(DefaultPrometheusMetrics.java:47)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        ... 11 common frames omitted

Not sure if it's same issue or not.

But started to appear with 2.5.2.

Instance is healthy and latest LTS 2.440.2. All plugins are up-to-date

Waschndolos commented 5 months ago

But have you been able to update and does the 2.5.2 version produce results?

That's kind of strange. I have multiple Jenkins instances. Windows, Docker... also colleagues tried that and nobody had any issues yet. Are we using the @Initializer(after = InitMilestone.JOB_LOADED) incorrect?

Waschndolos commented 5 months ago

Since I cannot reproduce the issue I've implemented a "shot in the dark" in PR #648. @jonesbusy @Alphayeeeet Could anybody of you test the .hpi file which comes out of https://ci.jenkins.io/job/Plugins/job/prometheus-plugin/job/PR-648/ ?

jonesbusy commented 5 months ago

Sure thanks, I will test the incrementals tomorrow.

I would be curious if the jenkins bom would be able to discover such issue

nilbek commented 5 months ago

Yesterday it happened to me too. I manually remove recently updated plugins one by one and after deleting prometheus, it could start normally.

Jenkins 2.440.2 Instance is healthy and latest LTS 2.440.2. All plugins are up-to-date. Platform: Linux rpm Controller Node is Running RHEL 8

More details: Actually I was not upgrading "Prometheus Plugin" to "2.5.2". I done that earlier. When "Prometheus Plugin, 2.5.2" already installed, I was trying to upgrade other plugins. After upgrading them and restart, it hangs and timeouts start of the jenkins.

Maybe helpful logs:

...
Mar 27 10:20:39 jenkins[1557315]: 2024-03-27 07:20:39.442+0000 [id=427311]        INFO        hudson.util.Retrier#start: Attempt #1 to do the action check updates server
...
Mar 27 10:20:58 jenkins[1557315]: 2024-03-27 07:20:58.263+0000 [id=427311]        INFO        hudson.util.Retrier#start: Performed the action check updates server successfully at the attempt #1
...
Mar 27 10:23:41 jenkins[1557315]: 2024-03-27 07:23:41.732+0000 [id=427311]        INFO        hudson.PluginManager#install: Starting installation of a batch of 3 plugins plus their dependencies
Mar 27 10:23:41 jenkins[1557315]: 2024-03-27 07:23:41.734+0000 [id=439073]        INFO        h.model.UpdateCenter$DownloadJob#run: Starting the installation of github-branch-source on behalf of USER
Mar 27 10:23:41 jenkins[1557315]: 2024-03-27 07:23:41.734+0000 [id=439073]        INFO        h.m.UpdateCenter$UpdateCenterConfiguration#download: Downloading github-branch-source
Mar 27 10:23:43 jenkins[1557315]: 2024-03-27 07:23:43.071+0000 [id=439073]        INFO        h.model.UpdateCenter$DownloadJob#run: Starting the installation of junit on behalf of USER
Mar 27 10:23:43 jenkins[1557315]: 2024-03-27 07:23:43.071+0000 [id=439073]        INFO        h.m.UpdateCenter$UpdateCenterConfiguration#download: Downloading junit
Mar 27 10:23:44 jenkins[1557315]: 2024-03-27 07:23:44.648+0000 [id=439073]        INFO        h.model.UpdateCenter$DownloadJob#run: Starting the installation of mina-sshd-api-core on behalf of USER
Mar 27 10:23:44 jenkins[1557315]: 2024-03-27 07:23:44.648+0000 [id=439073]        INFO        h.m.UpdateCenter$UpdateCenterConfiguration#download: Downloading mina-sshd-api-core
Mar 27 10:23:54 jenkins[1557315]: 2024-03-27 07:23:54.463+0000 [id=417202]        INFO        hudson.model.UpdateCenter#doSafeRestart: Scheduling Jenkins reboot
Mar 27 10:23:54 jenkins[1557315]: 2024-03-27 07:23:54.482+0000 [id=439102]        INFO        hudson.lifecycle.Lifecycle#onStatusUpdate: Restart in 10 seconds
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.486+0000 [id=439102]        INFO        hudson.lifecycle.Lifecycle#onStop: Stopping Jenkins as requested by USER
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.488+0000 [id=439102]        INFO        hudson.lifecycle.Lifecycle#onStatusUpdate: Stopping Jenkins
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.527+0000 [id=439102]        INFO        jenkins.model.Jenkins$16#onAttained: Started termination
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.589+0000 [id=439102]        INFO        c.zaxxer.hikari.HikariDataSource#close: HikariPool-1 - Shutdown initiated...
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.591+0000 [id=439102]        INFO        c.zaxxer.hikari.HikariDataSource#close: HikariPool-1 - Shutdown completed.
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.607+0000 [id=439102]        INFO        h.p.b.global.Lifecycle#shutdown: Shutdown complete - Global TimeOut ScheduledExecutorService had 0 tasks pending
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.611+0000 [id=439102]        INFO        jenkins.model.Jenkins$16#onAttained: Completed termination
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.611+0000 [id=439102]        INFO        jenkins.model.Jenkins#_cleanUpDisconnectComputers: Starting node disconnection
...
Mar 27 10:24:04 jenkins[1557315]: 2024-03-27 07:24:04.715+0000 [id=439102]        INFO        jenkins.model.Jenkins#_cleanUpShutdownPluginManager: Stopping plugin manager
...
Mar 27 10:24:05 jenkins[1557315]: 2024-03-27 07:24:05.020+0000 [id=439102]        INFO        jenkins.model.Jenkins#_cleanUpPersistQueue: Persisting build queue
Mar 27 10:24:05 jenkins[1557315]: 2024-03-27 07:24:05.024+0000 [id=439102]        INFO        jenkins.model.Jenkins#_cleanUpAwaitDisconnects: Waiting for node disconnection completion
...
Mar 27 10:24:05 jenkins[1557315]: 2024-03-27 07:24:05.033+0000 [id=439102]        INFO        hudson.lifecycle.Lifecycle#onStatusUpdate: Jenkins stopped
Mar 27 10:24:05 jenkins[1557315]: 2024-03-27 07:24:05.036+0000 [id=44]        INFO        winstone.Logger#logInternal: JVM is terminating. Shutting down Jetty
Mar 27 10:24:05 jenkins[1557315]: 2024-03-27 07:24:05.038+0000 [id=44]        INFO        org.eclipse.jetty.server.Server#doStop: Stopped Server@61009542{STOPPING}[10.0.20,sto=0]
Mar 27 10:24:05 jenkins[1557315]: 2024-03-27 07:24:05.043+0000 [id=439049]        WARNING        hudson.model.RunMap#retrieve: could not load /var/lib/jenkins/jobs/dev/jobs/JOB/builds/356
Mar 27 10:24:05 jenkins[1557315]: java.lang.IllegalStateException: Expected 1 instance of jenkins.model.TransientActionFactory$Cache but got 0
Mar 27 10:24:05 jenkins[1557315]:         at hudson.ExtensionList.lookupSingleton(ExtensionList.java:457)
Mar 27 10:24:05 jenkins[1557315]:         at jenkins.model.TransientActionFactory.factoriesFor(TransientActionFactory.java:127)
Mar 27 10:24:05 jenkins[1557315]:         at hudson.model.Actionable.getAllActions(Actionable.java:100)
Mar 27 10:24:05 jenkins[1557315]:         at hudson.model.Run.onLoad(Run.java:386)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:566)
Mar 27 10:24:05 jenkins[1557315]:         at hudson.model.RunMap.retrieve(RunMap.java:233)
Mar 27 10:24:05 jenkins[1557315]:         at hudson.model.RunMap.retrieve(RunMap.java:61)
Mar 27 10:24:05 jenkins[1557315]:         at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:663)
Mar 27 10:24:05 jenkins[1557315]:         at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:645)
Mar 27 10:24:05 jenkins[1557315]:         at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:543)
Mar 27 10:24:05 jenkins[1557315]:         at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:502)
Mar 27 10:24:05 jenkins[1557315]:         at jenkins.model.lazy.LazyBuildMixIn$RunMixIn.getPreviousBuild(LazyBuildMixIn.java:406)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.workflow.job.WorkflowRun.getPreviousBuild(WorkflowRun.java:272)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.workflow.job.WorkflowRun.getPreviousBuild(WorkflowRun.java:136)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.prometheus.JobCollector.appendJobMetrics(JobCollector.java:270)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.prometheus.JobCollector.lambda$collect$0(JobCollector.java:184)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.prometheus.util.Jobs.forEachJob(Jobs.java:19)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.prometheus.JobCollector.collect(JobCollector.java:177)
Mar 27 10:24:05 jenkins[1557315]:         at io.prometheus.client.Collector.collect(Collector.java:45)
Mar 27 10:24:05 jenkins[1557315]:         at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:204)
Mar 27 10:24:05 jenkins[1557315]:         at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:219)
Mar 27 10:24:05 jenkins[1557315]:         at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:152)
Mar 27 10:24:05 jenkins[1557315]:         at java.base/java.util.Enumeration$1.next(Enumeration.java:123)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.prometheus.config.disabledmetrics.FilteredMetricEnumeration.filterList(FilteredMetricEnumeration.java:18)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.prometheus.config.disabledmetrics.FilteredMetricEnumeration.<init>(FilteredMetricEnumeration.java:12)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics.collectMetrics(DefaultPrometheusMetrics.java:53)
Mar 27 10:24:05 jenkins[1557315]:         at org.jenkinsci.plugins.prometheus.service.PrometheusAsyncWorker.execute(PrometheusAsyncWorker.java:41)
Mar 27 10:24:05 jenkins[1557315]:         at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:102)
Mar 27 10:24:05 jenkins[1557315]:         at java.base/java.lang.Thread.run(Thread.java:829)
...
(same log for different jobs)
...
Mar 27 10:24:05 jenkins[1557315]: 2024-03-27 07:24:05.147+0000 [id=44]        INFO        o.e.j.s.handler.ContextHandler#doStop: Stopped w.@1e886a5b{Jenkins v2.440.2,/,null,STOPPED}{/var/cache/jenkins/war}
Mar 27 10:24:05 jenkins[1557315]: 2024-03-27 07:24:05.156+0000 [id=44]        INFO        winstone.Logger#logInternal: Jetty shutdown successfully
Mar 27 10:24:10 systemd[1]: jenkins.service: Main process exited, code=exited, status=5/NOTINSTALLED
Mar 27 10:24:10 systemd[1]: jenkins.service: Failed with result 'exit-code'.
Mar 27 10:24:11 systemd[1]: jenkins.service: Service RestartSec=100ms expired, scheduling restart.
Mar 27 10:24:11 systemd[1]: jenkins.service: Scheduled restart job, restart counter is at 3.
Mar 27 10:24:11 systemd[1]: Stopped Jenkins Continuous Integration Server.
Mar 27 10:24:11 systemd[1]: Starting Jenkins Continuous Integration Server...
...
(Only high severity logs hereafter)
...
Mar 27 10:25:41 systemd[1]: jenkins.service: start operation timed out. Terminating.
Mar 27 10:25:46 systemd[1]: jenkins.service: Failed with result 'timeout'.
Mar 27 10:25:46 systemd[1]: Failed to start Jenkins Continuous Integration Server.
Mar 27 10:27:17 systemd[1]: jenkins.service: start operation timed out. Terminating.
Mar 27 10:27:23 systemd[1]: jenkins.service: Failed with result 'timeout'.
Mar 27 10:27:23 systemd[1]: Failed to start Jenkins Continuous Integration Server.
Mar 27 10:28:54 systemd[1]: jenkins.service: start operation timed out. Terminating.
Mar 27 10:28:59 systemd[1]: jenkins.service: Failed with result 'timeout'.
Mar 27 10:28:59 systemd[1]: Failed to start Jenkins Continuous Integration Server.
...
nilbek commented 5 months ago

After investigating logs, i remembered more details and edited my comment. I hope it will help.

jonesbusy commented 5 months ago

Yes my feeling is also similar that is related with restart or reload. When starting Jenkins inside a container (lets say from a fresh start) it works. But when restarting a controller or upgrading plugin and then restart it doesn't work. I think it has do to with init phase, plugin reload or even class path ?

It could explain why test are not discovering such issue.

Perhaps it can be reproduced using RealJenkinsRule 🤔

Waschndolos commented 5 months ago

I think the root cause was the introduction of https://github.com/jenkinsci/prometheus-plugin/pull/637 that @Initializer annotation. Maybe I just revert that - although it does make sense to me to use the @Initializer annotation but it seems to produce a weired behavior

jonesbusy commented 5 months ago

I never used Initializer on any of my plugin so I'm so sure about the behavior. Perhaps worth to ask on the dev list.

Thanks in any case!

Waschndolos commented 5 months ago

Guess I'll revert things. Takes some time - currently low on time. Hopefully done at the weekend

Waschndolos commented 5 months ago

@jonesbusy I've created a new PR at #650. I've tested it explicitly on a Windows hosted machine with update and let Jenkins restart over the plugins page. Didn't find any issues. Could you maybe check if it also works on your instances?

Waschndolos commented 5 months ago

@Alphayeeeet @jonesbusy @nilbek . I've reverted the commit which might have caused the issue and published 2.5.3. Hopefully this release works for you all. Sorry for the trouble!

jonesbusy commented 5 months ago

Hi,

Thanks, no problem it can happen. I will test the 2.5.3 next week after holidays.

Best regards,

dominik-horb-umg commented 5 months ago

We were seeing the same issue described here, updating to 2.5.3 fixed the problem.

jonesbusy commented 5 months ago

I also just tested the 2.5.3 this morning. This is also fixed on my side. Thanks again for the fix

jonesbusy commented 1 month ago

This issue is back after upgrading from 773.v3b_62d8178eec to 778.ve1c932a_ff24f

[05:40:14.113+02:00] - Caused by: java.lang.reflect.InvocationTargetException
[05:40:14.113+02:00] -  at java.base/java.lang.reflect.Method.invoke(Method.java:580)
[05:40:14.113+02:00] -  at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:109)
[05:40:14.113+02:00] -  ... 9 more
[05:40:14.113+02:00] - Caused by: java.lang.IllegalArgumentException: Failed to register Collector of type JenkinsStatusCollector: default_jenkins_version_info is already in use by another Collector of type JenkinsStatusCollector
[05:40:14.113+02:00] -  at io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:57)
[05:40:14.113+02:00] -  at org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics.registerCollector(DefaultPrometheusMetrics.java:54)
[05:40:14.113+02:00] -  at org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics.registerCollectors(DefaultPrometheusMetrics.java:62)
[05:40:14.113+02:00] -  ... 11 more
[05:40:14.113+02:00] - 

I will open a new issue to track it

jonesbusy commented 1 month ago

New issue https://github.com/jenkinsci/prometheus-plugin/issues/683