timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-14362] 100% CPU load during org.kohsuke.stapler.compression.CompressionFilter.reportException #2407

Closed timja closed 11 years ago

timja commented 12 years ago

Jenkins starts using 100% CPU after a few days. Using jstack I see several threads trying to write compressed output, and apparently not changing over time:

I'm suspecting but not 100% sure that these threads are in an infinite loop (livelocked). I'm struggling to see what other threads might be doing this.

This JVM was not started with debugging enabled to attach a debugger for analysis. I've enabled it now. Stack traces attached as files below.


Originally reported by gcc, imported from: 100% CPU load during org.kohsuke.stapler.compression.CompressionFilter.reportException
  • assignee: lmcazra
  • status: Resolved
  • priority: Critical
  • resolution: Fixed
  • resolved: 2013-06-19T12:41:09+00:00
  • imported: 2022/01/10
timja commented 11 years ago

justinharringa:

Sorry, I should have clarified that I didn't see any threads using Deflater.deflateBytes in the my attempt to reproduce using the Wall Display plugin. I have seen that, in conjunction with consistently pegged CPUs, in my production instance on 1.480.3 (see thread-dump-prod.txt). I was hoping to properly indicate that the Wall Display method I was using wouldn't adequately reproduce the situation since the CPU usage fell.

Just to make sure I've got it clear, my impression is that the combination of pegged CPUs, which stay pegged, as well as a threadDump with threads using Deflater.deflateBytes is what we're looking for in order to call it a valid reproduction of the problem. Is that correct?

It seems like the closest I've gotten to reproducing this is by using dave catalan's suggested steps. Unfortunately, I was only able to get the thread dump with threads using Deflater.deflateBytes but the CPU usage only spiked to 100% for a few minutes. It seems that this would indicate that I haven't reproduced the issue that could be fixed. I attached 3 separate dumps for the reproduced items (thread-dump-reproduce-1.txt appears to be before or after the deflateBytes call).

It would seem that we don't have a deterministic way to reproduce this yet.

timja commented 11 years ago

jglick:

the combination of pegged CPUs, which stay pegged, as well as a threadDump with threads using Deflater.deflateBytes is what we're looking for in order to call it a valid reproduction of the problem. Is that correct?

Correct. Or at least the other reports I have heard indicate that the threads suffering the problem do not fix themselves spontaneously; but perhaps sometimes they do. Merely having some threads running deflateBytes does not definitely indicate this bug, because you might genuinely be trying to compress a lot of data at that moment.

timja commented 11 years ago

dennys:

Does this version is recommand for production use?
http://repo.jenkins-ci.org/releases/org/jenkins-ci/main/jenkins-war/1.518.JENKINS-14362-jzlib/jenkins-war-1.518.JENKINS-14362-jzlib.war

We are still use 1.512 now, and we want to upgrade, but I check the community rating, since 1.519, lots of people rollback to previous version.
If the patch for 1.518 is ready for production use? We want to use it first to fix the 100% CPU problem for now and wait a stable version in the future.

timja commented 11 years ago

jglick:

You are recommended to just use 1.520 or later. 1.518.JENKINS-14362-jzlib was offered for experimental purposes only and should no longer be used.

timja commented 11 years ago

justinharringa:

@Jesse: On Monday, we're going to upgrade from 1.480.3 to the 1.509.2.JENKINS-14362-jzlib.war on an instance that seems to hit this every few days or so. I'll be sure to report by the end of next week if we see this issue crop up again. I'm hopeful that this will help and I suspect reporting the result will be helpful to you.

timja commented 11 years ago

justinharringa:

We haven't seen this issue pop up on our large instance since we deployed 1.509.2.JENKINS-14362-jzlib.war on August 5th. We generally saw this problem once a week and we haven't seen it occur in over 2 weeks. It's certainly not a scientific measurement but hopefully it helps.

timja commented 11 years ago

jglick:

For the record, CompressionFilter was introduced as the fix for JENKINS-13625. Unclear to me whether using org.kohsuke.stapler.jelly.DefaultScriptInvoker.compress=false would revert that fix, or merely disable a performance optimization which relies on that fix.

timja commented 11 years ago

nerdmachine:

I suspect this fix in 1.520 caused the regression #JENKINS-19473

timja commented 11 years ago

dblanchette:

Is this fixed in 1.509.3?

timja commented 11 years ago

jglick:

No, this is not in 1.509.3. If there is interest I could produce an experimental build of 1.509.3 with this patch (plus the attempted fix of JENKINS-19473) applied.

timja commented 11 years ago

justinharringa:

I'd be interested in the proposed patch of 1.509.3. We've had good luck with your patched version of 1.509.2 for this issue.

timja commented 11 years ago

jglick:

http://repo.jenkins-ci.org/releases/org/jenkins-ci/main/jenkins-war/1.509.3.JENKINS-14362-jzlib/jenkins-war-1.509.3.JENKINS-14362-jzlib.war now available (also includes purported fix of JENKINS-19473). As usual beware that this build has received no testing whatsoever.

timja commented 11 years ago

dblanchette:

Thank you very much, we will try this build.

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/FilePath.java
core/src/main/java/hudson/console/AnnotatedLargeText.java
core/src/main/java/hudson/console/ConsoleNote.java
core/src/main/java/hudson/model/Run.java
core/src/main/java/hudson/model/UsageStatistics.java
core/src/main/java/hudson/security/HudsonAuthenticationEntryPoint.java
core/src/main/java/hudson/util/CompressedFile.java
http://jenkins-ci.org/commit/jenkins/a021fe5c2422c47d977d7937f4e01483f121b777
Log:
JENKINS-14362 Switch to jzlib for GZIP streaming.

(cherry picked from commit 3701a327a75d35e5d0e5795341cc55bb123a2007)

Conflicts:
cli/pom.xml
core/pom.xml
maven-plugin/pom.xml
plugins/pom.xml
pom.xml
test/pom.xml
ui-samples-plugin/pom.xml
war/pom.xml

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
pom.xml
http://jenkins-ci.org/commit/maven-plugin/fba17cd8497d9ddac9c55fead9c9a8da61059b23
Log:
JENKINS-14362 Switch to jzlib for GZIP streaming.

Originally-Committed-As: 3701a327a75d35e5d0e5795341cc55bb123a2007

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
pom.xml
src/main/java/hudson/maven/AbstractMavenProcessFactory.java
src/main/java/hudson/maven/AggregatingClassLoader.java
src/main/java/hudson/maven/Maven3Builder.java
src/main/java/hudson/maven/Maven3ProcessFactory.java
src/main/java/hudson/maven/MavenBuildInformation.java
src/main/java/hudson/maven/MavenEmbedderRequest.java
src/main/java/hudson/maven/MavenModuleSet.java
src/main/java/hudson/maven/MavenModuleSetBuild.java
src/main/java/hudson/maven/MavenProbeAction.java
src/main/java/hudson/maven/MavenProcessFactory.java
src/main/java/hudson/maven/MavenUtil.java
src/main/java/hudson/maven/PlexusModuleContributor.java
src/main/java/hudson/maven/reporters/MavenAbstractArtifactRecord.java
src/main/java/hudson/maven/reporters/MavenFingerprinter.java
src/main/java/hudson/maven/reporters/TestMojo.java
src/main/resources/hudson/maven/MavenBuild/executedMojos_fr.properties
src/main/resources/hudson/maven/MavenModuleSetBuild/main_pl.properties
src/main/resources/hudson/maven/Messages.properties
src/main/resources/hudson/maven/Messages_es.properties
src/main/resources/hudson/maven/Messages_zh_TW.properties
src/main/resources/hudson/maven/reporters/MavenAbstractArtifactRecord/index_fr.properties
src/test/java/hudson/maven/reporters/SurefireArchiverUnitTest.java
src/test/java/hudson/maven/reporters/TestMojoTest.java
http://jenkins-ci.org/commit/maven-plugin/14863a0498e0e6c389f2779b9150e9280b363d3e
Log:
Merge branch 'master' into JENKINS-14362-jzlib

Originally-Committed-As: dedb24ef034a04fea288eaa50089e8e774714e99

timja commented 11 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/resources/hudson/maven/Messages.properties
src/main/resources/hudson/maven/Messages_de.properties
src/main/resources/hudson/maven/RedeployPublisher/config_de.properties
src/main/resources/hudson/maven/reporters/MavenAbstractArtifactRecord/index_de.properties
http://jenkins-ci.org/commit/maven-plugin/d75a8c218b6ffd24fe2ddea8953ffaf22da63bc6
Log:
Merge branch 'master' into JENKINS-14362-jzlib

Conflicts:
core/pom.xml

Originally-Committed-As: b7e2d6127f77711681113a020681508d57622ea5

timja commented 11 years ago

bbonn:

Hi There,

Just verifying. I see this bug has a 1.509.4 fixed label, but on Jenkins-CI.org, it is not listed in the change log until 1.520. Can someone please validate where the fix is? Thanks a lot.

timja commented 11 years ago

jglick:

It was first fixed in trunk in 1.520, then backported to 1.509.4. http://jenkins-ci.org/changelog-stable neglects to mention it, but that changelog is generally unreliable anyway.

timja commented 3 years ago

JIRAUSER132865:

FWIW, this appears to be the same bug as this JDK bug https://bugs.openjdk.java.net/browse/JDK-8193682

timja commented 2 years ago

[Originally depends on: JENKINS-19473]