timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-63169] Jenkins hangs after Listing all plugins #1120

Open timja opened 4 years ago

timja commented 4 years ago

After restarting Jenkins (cold-load of the system), Jenkins came up once. We took it down cleanly, then restarted and it hung after the message "Listed all plugins". We waited 12 hours for Jenkins to come up without success or additional messages. It usually starts in under 15 seconds from this point.

Thread dump is attached.


Originally reported by rsbeckerca, imported from: Jenkins hangs after Listing all plugins
  • status: Open
  • priority: Blocker
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 4 years ago

rsbeckerca:

We were able to bring up jenkins by removing the jobs directory. By hypothesis, there may be something corrupt associated with one of the jobs.

timja commented 4 years ago

oleg_nenashev:

Hi. Sorry for the late response. So here is the sequence of events:

  1. There is a onInitMilestoneAttained() handler in the Jenkins core (second thread). It runs after all plugins are prepared. https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/jenkins/model/Jenkins.java#L1154-L1165 
  2. The hook runs the extension loading in hudson.ClassicPluginStrategy.findComponent()
  3. Extension loading iterates all Extension annotations using the sezpoz library. The library loads classes by finally invoking the native java.lang.Class.forName0() method. There is no evidence of the method being blocked, but it definitely is NOT expected to run for 12 hours
  4. Project Inheritance Plugin defines the `onJenkinsStart()` startup hook which runs after all jobs are loaded. https://github.com/i-m-c/jenkins-inheritance-plugin/blob/master/src/main/java/hudson/plugins/project_inheritance/projects/creation/CreationItemListener.java#L94-L95
  5. This hook initializes the job recreation, and it loads configurations. It is a legitimate behavior

Cause analysis:

rsbeckerca my recommendation would be to start from updating to a recent Java version. If it does not help, please collect the startup logs and a time series of stacktractes (e.g. every 5 seconds)

 

 

"CreationItemListener.onJenkinsStart" #26 daemon prio=5 os_prio=24 tid=0x0000000403243000 nid=0x1c waiting for monitor entry [0x00000000656fd000] java.lang.Thread.State: BLOCKED (on object monitor) at hudson.ExtensionList.ensureLoaded(ExtensionList.java:315)
 - waiting to lock <0x00000007267b5890> (a hudson.ExtensionList$Lock) at hudson.ExtensionList.iterator(ExtensionList.java:172)
...
     at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1173)
     at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1053)
 5 >>> at hudson.XmlFile.read(XmlFile.java:149)
     at hudson.plugins.project_inheritance.projects.creation.ProjectCreationEngine.(ProjectCreationEngine.java:380)
     at hudson.plugins.project_inheritance.projects.creation.ProjectCreationEngine.(ProjectCreationEngine.java:317)
 4 >>> at hudson.plugins.project_inheritance.projects.creation.CreationItemListener.onJenkinsStart(CreationItemListener.java:119)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
 at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
 at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
 at jenkins.model.Jenkins$5.runTask(Jenkins.java:1131)
 at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
 at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 

 

"pool-6-thread-1" #25 daemon prio=5 os_prio=24 tid=0x0000000403251400 nid=0x1b in Object.wait() [0x00000000657fe000]
 java.lang.Thread.State: RUNNABLE
3 >>> at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:365)
 at hudson.ExtensionFinder$Sezpoz.scout(ExtensionFinder.java:733)
2 >>>  at hudson.ClassicPluginStrategy.findComponents(ClassicPluginStrategy.java:340)
 at hudson.ExtensionList.load(ExtensionList.java:381)
 at hudson.ExtensionList.ensureLoaded(ExtensionList.java:317)
 - locked <0x00000007267b5890> (a hudson.ExtensionList$Lock)
 at hudson.ExtensionList.getComponents(ExtensionList.java:183)
1 >>> at jenkins.model.Jenkins$6.onInitMilestoneAttained(Jenkins.java:1161)
 at jenkins.InitReactorRunner$1.onAttained(InitReactorRunner.java:85)
 at org.jvnet.hudson.reactor.ReactorListener$Aggregator.lambda$onAttained$3(ReactorListener.java:102)
 at org.jvnet.hudson.reactor.ReactorListener$Aggregator$$Lambda$48/312649756.run(Unknown Source)
 at org.jvnet.hudson.reactor.ReactorListener$Aggregator.run(ReactorListener.java:109)
 at org.jvnet.hudson.reactor.ReactorListener$Aggregator.onAttained(ReactorListener.java:102)
 at org.jvnet.hudson.reactor.Reactor$1.run(Reactor.java:177)
 at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 

 

timja commented 4 years ago

rsbeckerca:

We found two conditions that were occurring at startup.

  1. JGit was throwing Arithmetic Overflow conditions because the file system does not support nanosecond resolution. The UNDEFINED (Long.MAX_VALUE) value is unchecked, and any addition with it causes the overflow. This was separately reported to the JGit team.
  2. The CACERTS were out of date for the jenkins update site. So none of the connections done at startup were working. These went without logging exceptions but fixing the CACERTS allowed Jenkins to start up correctly.

As far as upgrading the JVM, this is not something within our team's control. The vendor (HPE) is not releasing new JVM versions on the Itanium platform although we have one more possible upgrade. I will let the team know about this restriction.