Closed timja closed 11 years ago
Sorry, I should have clarified that I didn't see any threads using Deflater.deflateBytes in the my attempt to reproduce using the Wall Display plugin. I have seen that, in conjunction with consistently pegged CPUs, in my production instance on 1.480.3 (see thread-dump-prod.txt). I was hoping to properly indicate that the Wall Display method I was using wouldn't adequately reproduce the situation since the CPU usage fell.
Just to make sure I've got it clear, my impression is that the combination of pegged CPUs, which stay pegged, as well as a threadDump with threads using Deflater.deflateBytes is what we're looking for in order to call it a valid reproduction of the problem. Is that correct?
It seems like the closest I've gotten to reproducing this is by using dave catalan's suggested steps. Unfortunately, I was only able to get the thread dump with threads using Deflater.deflateBytes but the CPU usage only spiked to 100% for a few minutes. It seems that this would indicate that I haven't reproduced the issue that could be fixed. I attached 3 separate dumps for the reproduced items (thread-dump-reproduce-1.txt appears to be before or after the deflateBytes call).
It would seem that we don't have a deterministic way to reproduce this yet.
the combination of pegged CPUs, which stay pegged, as well as a threadDump with threads using Deflater.deflateBytes is what we're looking for in order to call it a valid reproduction of the problem. Is that correct?
Correct. Or at least the other reports I have heard indicate that the threads suffering the problem do not fix themselves spontaneously; but perhaps sometimes they do. Merely having some threads running deflateBytes does not definitely indicate this bug, because you might genuinely be trying to compress a lot of data at that moment.
Does this version is recommand for production use?
http://repo.jenkins-ci.org/releases/org/jenkins-ci/main/jenkins-war/1.518.JENKINS-14362-jzlib/jenkins-war-1.518.JENKINS-14362-jzlib.war
We are still use 1.512 now, and we want to upgrade, but I check the community rating, since 1.519, lots of people rollback to previous version.
If the patch for 1.518 is ready for production use? We want to use it first to fix the 100% CPU problem for now and wait a stable version in the future.
You are recommended to just use 1.520 or later. 1.518.JENKINS-14362-jzlib was offered for experimental purposes only and should no longer be used.
@Jesse: On Monday, we're going to upgrade from 1.480.3 to the 1.509.2.JENKINS-14362-jzlib.war on an instance that seems to hit this every few days or so. I'll be sure to report by the end of next week if we see this issue crop up again. I'm hopeful that this will help and I suspect reporting the result will be helpful to you.
We haven't seen this issue pop up on our large instance since we deployed 1.509.2.JENKINS-14362-jzlib.war on August 5th. We generally saw this problem once a week and we haven't seen it occur in over 2 weeks. It's certainly not a scientific measurement but hopefully it helps.
For the record, CompressionFilter was introduced as the fix for JENKINS-13625. Unclear to me whether using org.kohsuke.stapler.jelly.DefaultScriptInvoker.compress=false would revert that fix, or merely disable a performance optimization which relies on that fix.
I suspect this fix in 1.520 caused the regression #JENKINS-19473
Is this fixed in 1.509.3?
No, this is not in 1.509.3. If there is interest I could produce an experimental build of 1.509.3 with this patch (plus the attempted fix of JENKINS-19473) applied.
I'd be interested in the proposed patch of 1.509.3. We've had good luck with your patched version of 1.509.2 for this issue.
http://repo.jenkins-ci.org/releases/org/jenkins-ci/main/jenkins-war/1.509.3.JENKINS-14362-jzlib/jenkins-war-1.509.3.JENKINS-14362-jzlib.war now available (also includes purported fix of JENKINS-19473). As usual beware that this build has received no testing whatsoever.
Thank you very much, we will try this build.
Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/FilePath.java
core/src/main/java/hudson/console/AnnotatedLargeText.java
core/src/main/java/hudson/console/ConsoleNote.java
core/src/main/java/hudson/model/Run.java
core/src/main/java/hudson/model/UsageStatistics.java
core/src/main/java/hudson/security/HudsonAuthenticationEntryPoint.java
core/src/main/java/hudson/util/CompressedFile.java
http://jenkins-ci.org/commit/jenkins/a021fe5c2422c47d977d7937f4e01483f121b777
Log:
JENKINS-14362 Switch to jzlib for GZIP streaming.
(cherry picked from commit 3701a327a75d35e5d0e5795341cc55bb123a2007)
Conflicts:
cli/pom.xml
core/pom.xml
maven-plugin/pom.xml
plugins/pom.xml
pom.xml
test/pom.xml
ui-samples-plugin/pom.xml
war/pom.xml
Code changed in jenkins
User: Jesse Glick
Path:
pom.xml
http://jenkins-ci.org/commit/maven-plugin/fba17cd8497d9ddac9c55fead9c9a8da61059b23
Log:
JENKINS-14362 Switch to jzlib for GZIP streaming.
Originally-Committed-As: 3701a327a75d35e5d0e5795341cc55bb123a2007
Code changed in jenkins
User: Jesse Glick
Path:
pom.xml
src/main/java/hudson/maven/AbstractMavenProcessFactory.java
src/main/java/hudson/maven/AggregatingClassLoader.java
src/main/java/hudson/maven/Maven3Builder.java
src/main/java/hudson/maven/Maven3ProcessFactory.java
src/main/java/hudson/maven/MavenBuildInformation.java
src/main/java/hudson/maven/MavenEmbedderRequest.java
src/main/java/hudson/maven/MavenModuleSet.java
src/main/java/hudson/maven/MavenModuleSetBuild.java
src/main/java/hudson/maven/MavenProbeAction.java
src/main/java/hudson/maven/MavenProcessFactory.java
src/main/java/hudson/maven/MavenUtil.java
src/main/java/hudson/maven/PlexusModuleContributor.java
src/main/java/hudson/maven/reporters/MavenAbstractArtifactRecord.java
src/main/java/hudson/maven/reporters/MavenFingerprinter.java
src/main/java/hudson/maven/reporters/TestMojo.java
src/main/resources/hudson/maven/MavenBuild/executedMojos_fr.properties
src/main/resources/hudson/maven/MavenModuleSetBuild/main_pl.properties
src/main/resources/hudson/maven/Messages.properties
src/main/resources/hudson/maven/Messages_es.properties
src/main/resources/hudson/maven/Messages_zh_TW.properties
src/main/resources/hudson/maven/reporters/MavenAbstractArtifactRecord/index_fr.properties
src/test/java/hudson/maven/reporters/SurefireArchiverUnitTest.java
src/test/java/hudson/maven/reporters/TestMojoTest.java
http://jenkins-ci.org/commit/maven-plugin/14863a0498e0e6c389f2779b9150e9280b363d3e
Log:
Merge branch 'master' into JENKINS-14362-jzlib
Originally-Committed-As: dedb24ef034a04fea288eaa50089e8e774714e99
Code changed in jenkins
User: Jesse Glick
Path:
src/main/resources/hudson/maven/Messages.properties
src/main/resources/hudson/maven/Messages_de.properties
src/main/resources/hudson/maven/RedeployPublisher/config_de.properties
src/main/resources/hudson/maven/reporters/MavenAbstractArtifactRecord/index_de.properties
http://jenkins-ci.org/commit/maven-plugin/d75a8c218b6ffd24fe2ddea8953ffaf22da63bc6
Log:
Merge branch 'master' into JENKINS-14362-jzlib
Conflicts:
core/pom.xml
Originally-Committed-As: b7e2d6127f77711681113a020681508d57622ea5
Hi There,
Just verifying. I see this bug has a 1.509.4 fixed label, but on Jenkins-CI.org, it is not listed in the change log until 1.520. Can someone please validate where the fix is? Thanks a lot.
It was first fixed in trunk in 1.520, then backported to 1.509.4. http://jenkins-ci.org/changelog-stable neglects to mention it, but that changelog is generally unreliable anyway.
FWIW, this appears to be the same bug as this JDK bug https://bugs.openjdk.java.net/browse/JDK-8193682
[Originally depends on: JENKINS-19473]
Jenkins starts using 100% CPU after a few days. Using jstack I see several threads trying to write compressed output, and apparently not changing over time:
I'm suspecting but not 100% sure that these threads are in an infinite loop (livelocked). I'm struggling to see what other threads might be doing this.
This JVM was not started with debugging enabled to attach a debugger for analysis. I've enabled it now. Stack traces attached as files below.
Originally reported by gcc, imported from: 100% CPU load during org.kohsuke.stapler.compression.CompressionFilter.reportException