timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-34573] Running out of Heap Memory (2048m) #3089

Open timja opened 8 years ago

timja commented 8 years ago

-Setup with less than 100 nodes
-Things were going fine, and then the whole setup started freezing; initially errors were indicating that we're running out of PerGen space (I upped that to 512m)
-Then same symptoms, setup would freeze; errors were showing running out of heap memory so I increased that 2048 on the master but I can still see the memory increasing significantly (I run GC to avoid the error (clears out more than a G), but I suspect a memory leak somewhere)

-I can't really include a full heap-dump due to some restrictions; however here's a list of the 20 objects retained by size:

Class Name

Retained Size

org.eclipse.jetty.webapp.WebAppClassLoader#1

9,885,754

java.util.Vector#16

9,476,255

java.lang.Object[]#10612

9,476,219

org.apache.commons.jexl.util.introspection.Introspector#1

6,934,833

class org.apache.commons.jexl.util.introspection.UberspectImpl

6,934,833

java.util.HashMap#7545

6,907,893

java.util.HashMap$Entry[]#74405

6,907,825

org.kohsuke.stapler.WebApp#1

6,267,722

java.util.HashMap#4741

6,265,930

java.util.HashMap$Entry[]#72991

6,265,862

class hudson.model.Run

5,730,085

hudson.util.XStream2#4

5,730,037

com.thoughtworks.xstream.core.DefaultConverterLookup#3

5,087,721

com.thoughtworks.xstream.converters.SingleValueConverterWrapper#51

5,050,913

com.thoughtworks.xstream.converters.basic.StringConverter#2

5,050,889

java.util.Collections$SynchronizedMap#39

5,050,861

com.thoughtworks.xstream.core.util.WeakCache#2

5,050,805

java.util.WeakHashMap#389

5,050,765

java.util.WeakHashMap$Entry[]#5781

4,915,608

hudson.model.Hudson#1

3,763,073


Originally reported by raccary, imported from: Running out of Heap Memory (2048m)
  • status: Open
  • priority: Blocker
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 8 years ago

alexandrbykov:

I have same issue with Jenkins 2.1 after update from 1.656.

timja commented 8 years ago

danielbeck:

Please attach the output of the /threadDump URL (while Jenkins is well on its way to OOM) this issue.

Please install the Support Core Plugin and attach a support bundle to this bug report.

Please also install the Monitoring Plugin and provide screenshots of the graphs before things break.

timja commented 8 years ago

alexandrbykov:

I collect Dump + report from Support Core Plugin + report from Monitoring Plugin
https://drive.google.com/folderview?id=0B9JKekl6nypsRDhoMmJXWjlEQW8&usp=sharing

timja commented 8 years ago

danielbeck:

alexandrbykov Thanks, looking. Unfortunately the plugins haven't had time to collect interesting looking data, the heap stats also look rather healthy. Will be more informative once there's the first OOM error.

Please note that the support bundle contains the Jenkins admin password, as well as the HTTP keystore password, as they're command line arguments to Jenkins that get logged, so I recommend you consider changing those.

timja commented 8 years ago

raccary:

Hey danielbeck,
I just want to make sure that we're not handling two different issues here. Despite the common symptoms, it might be a totally different issue. I have a generated heap dump (.hprof); will that be of equivalent value to providing a threadDump? (it's ~ 1.8GB)

I have already added an attachment which shows the memory profile (memory.png). I will be enabling the support plugin and performing the action item you asked for.

timja commented 8 years ago

raccary:

danielbeck, here's the generated heap-dump : https://www.dropbox.com/s/5zly2hmbi8dvjop/heapdump.hprof?dl=0

timja commented 8 years ago

raccary:

Any ideas? danielbeck

timja commented 8 years ago

danielbeck:

Well, I wasted a great deal of time by running out of memory analyzing the heap dump yesterday.

Haven't forgotten about this, just no time to look into it so far.

timja commented 8 years ago

danielbeck:

stephenconnolly pointed someone on the users mailing list to JENKINS-34213:
https://groups.google.com/d/msg/jenkinsci-users/qLwBFyQ84Z4/GWP4Ve_jOAAJ

Given your 100 nodes setup it seems likely to be related.

timja commented 8 years ago

raccary:

Some more observations I had:
-This often coincides with data starting to show up in the manage old data
-Running GC clears the old data somehow
-I usually see in the old data messages about "jenkins.metrics.impl" and "EnQueueAction or InQueueAction"
-Issues in log rotation showing up in the warnings log

timja commented 8 years ago

danielbeck:

I think old Data is retained in memory as long as the associated records (e.g. build records) are in memory. If they're discarded from memory, old data should follow.

Install and/or enable the Metrics Plugin to see whether that helps. It adds metadata to every build about its time in the queue:
https://github.com/jenkinsci/metrics-plugin/blob/master/src/main/java/jenkins/metrics/impl/TimeInQueueAction.java

timja commented 8 years ago

raccary:

Hi Daniel, I can try that.

However, did the heap dump show anything as far as to what's causing this memory issue?

timja commented 8 years ago

danielbeck:

Haven't had time to look further, unfortunately. Giving you these hints is quick enough to sneak in between actual work