timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-21414] PermGen space OOME triggered on use of Jenkins GUI #10931

Open timja opened 10 years ago

timja commented 10 years ago

Frequent (>= 1x/day) Jenkins crashes with "java.lang.OutOfMemoryError: PermGen space". Has been seen at our site since at least Jenkins 1.505 (March 2013). It will go away for weeks at a time, then return to be a constant annoyance.

The problem is most often triggered by making job configuration changes via the GUI, though has also been triggered by upgrading a plug-in via the GUI. About half the time, the GUI is not involved, i.e., it appears by chance, though it is possible one of the other 5-10 users is in the GUI.

Our PermGen threshold is set very high, 1536MB, after suggestions to raise it to avoid triggering the error. Initially 256M, then 512M, and 1024M. Progressively higher limits have had no effect. Disk space and swap space are not issues. The master machine runs some of our overnight builds but is idle or nearly so during business hours. Most of the Jenkins jobs are merely pieces of the overnight builds, which do not hit the PermGen limit.

To be clear, it is Jenkins hitting PermGen, not the individual jobs. When Jenkins hits that limit, all processing ceases and it must be "kill -9"ed and restarted. I have to watch the running log constantly.

Summary of the Jenkins running environment:
echo $MAVEN_OPTS
-Xms512m -Xmx1536m -Djava.awt.headless=true -XX:-UseGCOverheadLimit -XX:NewSize=128m -XX:MaxPermSize=1536m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/dumps
echo $JAVA_OPTS
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/dumps
java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/dumps -jar jenkins.war --httpPort=8070 &

Our setup is moderate in size, about 100 jobs, only 20 of which are active, using three slave machines (Linux, MacOS, Ubuntu) in addition to the master. I have noticed no pattern of a particular slave or job triggering the error, nor is it load related. The GUI is usually called from a Win7 box running IE9.

I have captured some heap dumps. They are too large to attach, so I will provide links to Dropbox (see URL below for the first). Unfortunately I have not been able to capture Jenkins logs just prior to the crashes, nor have I been able to clearly establish a predictable pattern for reproducing it. Just that in trying to use the GUI to modify the job configurations of 5-10 related jobs (to use a different Maven repository, for instance), the chances of making it through all without crashing Jenkins are small.

Also attached is a screen shot from (Eclipse) Memory Analysis Tool, showing the top leak suspect. The line outlined in blue, and the line below it, show the Jenkins view ("2014-Q2") and the particular job whose config I was editing ("CompileOnly-2-14-Q2-sdk") at the time of the crash.

At the very least, I would be happy if someone can help me identify whether there is a problem in the Jenkins product, a plug-in, or the manner of its configuration or use.


Originally reported by sstrickland, imported from: PermGen space OOME triggered on use of Jenkins GUI
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 10 years ago

sstrickland:

In this URL
https://www.dropbox.com/s/rdv1a69t2nau7ep/java_pid79986.hprof
is a second heap dump of about 148 MB, this one triggered by attempting to view a large (6.8 MB) FindBugs warnings output. The PermGen values were set very high, 2048 MB in the creating job, 8192 MB in the JVM. Viewing a companion FindBugs warning output of about 800 KB did not trigger the heap dump.

This can be triggered on demand, on an idle system that has just been restarted. I suspect some sort of memory leak in either the FindBugs plugin or Jenkins itself. I have further verified that I have installed all supporting plugins, and they are up to date. Jenkins itself was just upgraded today to 1.549, with the same result.

As this screen capture shows, a large amount of this memory seems to be consumed by FindBugs.
https://www.dropbox.com/s/1uc6nz2vda6ouae/HeapDump79986_memory_analysis_1.png

If there is something I am doing wrong, please let me know. If there is something wrong with Jenkins or the FindBugs plugin, I hope this shines some light on it.

timja commented 10 years ago

sstrickland:

I may have resolved this myself. Initially I set my memory limits to extremely high levels:
... -Xms8192m -Xmx8192m -XX:MaxPermSize=16384m ...
This allowed me to gain some stability.

Upon further examination, I found I had one plugin that was misconfigured, was throwing errors, and was probably not being used, the LDAP Email Plugin. I removed that, consolidated redundant environment variables on the JVM call, and ceased using the "-XX:NewSize=128m" parameter. Those were the only substantive changes.

I attempted to trigger a dump by calling the FindBugs Warnings chart, and found I could still obtain the report with memory requests much smaller. The following is now working for me, as I have gone five days without a crash for the first time in weeks.

java -Xms1024m -Xmx1024m \
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/dumps \
-XX:MaxPermSize=1024m -jar jenkins.war --httpPort=8070 &

I would feel better knowing which of the changes above resolved the problem, though I am happy enough right now not to have to watch for Jenkins failures every 10 minutes.

timja commented 10 years ago

danielbeck:

What kind of errors were thrown by LDAP Email? Which versions of Jenkins and the plugin? This should probably be switched over to that component.

Did this issue occur again?

timja commented 10 years ago

sstrickland:

LDAP Email was not being used at all. It was not throwing errors. Jenkins, at the time, was at 1.549. This far removed, I do not recall which specific versions of each plugin were in use.

The problem essentially disappeared after my 05-Feb-2014 post.