Closed Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk closed 1 year ago
Added more logging to code:
--- src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-02-21 12:05:27.519660828 +0000
+++ src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-03-07 11:04:12.222995540 +0000
@@ -39,6 +39,7 @@
public int read(byte[] b, int off, int len) throws IOException {
readInternal();
int byteRead = multiplexed.read(b, off, Math.min(next, len));
+ LOGGER.info("[1]: byteRead={} next={} off={} len={} | {}", byteRead, next, off, len, name);
next -= byteRead;
return byteRead;
}
@@ -49,6 +50,7 @@
int todo = 8;
while (todo > 0) {
int i = multiplexed.read(header, 8-todo, todo);
+ LOGGER.info("[2]: i={} todo={} | {}", i, todo, name);
if (i < 0) return; // EOF
todo -= i;
}
@@ -63,6 +65,7 @@
int received = 0;
while (received < size) {
int i = multiplexed.read(payload, received, size - received);
+ LOGGER.info("[3]: i={} received={} | {}", i, received, name);
if (i < 0) break; // EOF
received += i;
}
and got following messages when "100% CPU" problem came again:
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
VisualVM thread dump:
"Channel reader thread: java-multi-000qefydty62o" - Thread t@28548
java.lang.Thread.State: RUNNABLE
at java.base@11.0.18/java.io.FileOutputStream.writeBytes(Native Method)
at java.base@11.0.18/java.io.FileOutputStream.write(Unknown Source)
at java.base@11.0.18/java.io.BufferedOutputStream.write(Unknown Source)
- locked <3b140a2a> (a java.io.BufferedOutputStream)
at java.base@11.0.18/java.io.PrintStream.write(Unknown Source)
- locked <2d5aa34c> (a java.io.PrintStream)
at java.base@11.0.18/sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
at java.base@11.0.18/sun.nio.cs.StreamEncoder.implFlushBuffer(Unknown Source)
at java.base@11.0.18/sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
at java.base@11.0.18/sun.nio.cs.StreamEncoder.flush(Unknown Source)
- locked <278b9bac> (a java.io.OutputStreamWriter)
at java.base@11.0.18/java.io.OutputStreamWriter.flush(Unknown Source)
at java.logging@11.0.18/java.util.logging.StreamHandler.flush(Unknown Source)
- locked <158bbe21> (a java.util.logging.ConsoleHandler)
at java.logging@11.0.18/java.util.logging.ConsoleHandler.publish(Unknown Source)
at java.logging@11.0.18/java.util.logging.Logger.log(Unknown Source)
at org.slf4j.jul.JDK14LoggerAdapter.innerNormalizedLoggingCallHandler(JDK14LoggerAdapter.java:156)
at org.slf4j.jul.JDK14LoggerAdapter.handleNormalizedLoggingCall(JDK14LoggerAdapter.java:140)
at org.slf4j.helpers.AbstractLogger.handleArgArrayCall(AbstractLogger.java:403)
at org.slf4j.helpers.AbstractLogger.info(AbstractLogger.java:221)
at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:42)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:102)
at hudson.remoting.ChunkedInputStream.read(ChunkedInputStream.java:48)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:99)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
As we see the problem is in endless byteRead=0
. Looked at the chain in thread dump we see that we can break infinite loop by returning -1
to readUntilBreak()
(https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/ChunkedInputStream.java#L97-L103)
if (remaining>0) {
// more bytes to read in the current chunk
int read = read(buf, 0, Math.min(remaining,buf.length));
if (read==-1)
throw new IOException("Unexpected EOF");
sink.write(buf,0,read);
} else {
So I added the conditions for read()
result == 0
:
--- src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-02-21 12:05:27.519660828 +0000
+++ src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-02-27 18:53:16.237851633 +0000
@@ -39,6 +39,8 @@
public int read(byte[] b, int off, int len) throws IOException {
readInternal();
int byteRead = multiplexed.read(b, off, Math.min(next, len));
+ LOGGER.info("[1]: byteRead={} next={} off={} len={} | {}", byteRead, next, off, len, name);
+ if (byteRead <= 0) return -1;
next -= byteRead;
return byteRead;
}
@@ -49,7 +51,8 @@
int todo = 8;
while (todo > 0) {
int i = multiplexed.read(header, 8-todo, todo);
- if (i < 0) return; // EOF
+ LOGGER.info("[2]: i={} todo={} | {}", i, todo, name);
+ if (i <= 0) return; // EOF
todo -= i;
}
int size = ((header[4] & 0xff) << 24) + ((header[5] & 0xff) << 16) + ((header[6] & 0xff) << 8) + (header[7] & 0xff);
@@ -63,6 +66,7 @@
int received = 0;
while (received < size) {
int i = multiplexed.read(payload, received, size - received);
+ LOGGER.info("[3]: i={} received={} | {}", i, received, name);
if (i < 0) break; // EOF
received += i;
And everything worked out ! No more any "100% CPU" problems (tested for a week). Now log looks like:
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=8 todo=8 | java-multi-0002cbq0w9xea (5379ff...345375)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=574 next=574 off=0 len=574 | java-multi-0002cbq0w9xea (5379ff...345375)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=8 todo=8 | java-multi-0002cbq0w9xea (5379ff...345375)
INFO i.j.docker.DockerTransientNode$1#println: Disconnected computer for node 'java-multi-0002cbq0w9xea'.
INFO i.j.docker.DockerTransientNode$1#println: Removed Node for node 'java-multi-0002cbq0w9xea'.
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8 | java-multi-0002cbq0w9xea (5379ff...345375)
INFO i.j.d.c.DockerMultiplexedInputStream#read: [1]: byteRead=0 next=0 off=0 len=1177 | java-multi-0002cbq0w9xea (5379ff...345375)
INFO h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel java-multi-0002cbq0w9xea
java.io.IOException: Unexpected EOF
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:101)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
I am not a Java developer, but I propose to consider this patch (at least it works for me):
diff orig/src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java patched/src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java
--- src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-02-21 12:05:27.519660828 +0000
+++ src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-03-07 11:41:24.105315705 +0000
@@ -39,6 +39,7 @@
public int read(byte[] b, int off, int len) throws IOException {
readInternal();
int byteRead = multiplexed.read(b, off, Math.min(next, len));
+ if (byteRead <= 0) return -1;
next -= byteRead;
return byteRead;
}
@@ -49,7 +50,7 @@
int todo = 8;
while (todo > 0) {
int i = multiplexed.read(header, 8-todo, todo);
- if (i < 0) return; // EOF
+ if (i <= 0) return; // EOF
todo -= i;
}
int size = ((header[4] & 0xff) << 24) + ((header[5] & 0xff) << 16) + ((header[6] & 0xff) << 8) + (header[7] & 0xff);
@@ -63,7 +64,7 @@
int received = 0;
while (received < size) {
int i = multiplexed.read(payload, received, size - received);
- if (i < 0) break; // EOF
+ if (i <= 0) break; // EOF
received += i;
}
if (LOGGER.isInfoEnabled()) {
I'm not longer maintaining this plugin but, personally, I'd be somewhat wary of this change.
There's a big difference in meaning between -1
and 0
. The former means EOF. The latter means we've either got nothing to read, or that we asked to read nothing and so nothing was read.
I suspect that changing from < 0
to <= 0
would simply swap one set of trouble (which you're experiencing, but few others seem to be) for another (which you aren't, but others might).
Looking at the code, I suspect that the problem steps from readInternal
not returning a value indicating "EOF has been reached" (or not), allowing a potential failure condition where next
is zero but int i = multiplexed.read(header, 8-todo, todo)
hits EOF and returns -1
, meaning that read(...)
's call to int byteRead = multiplexed.read(b, off, Math.min(next, len));
actually passes (b, off, 0)
to multiplexed.read(...)
... meaning that it's valid to immediately return 0 (see InputStream.read API) irrespective of whether or not the stream has reached EOF or not.
IMO a better (more robust) fix would be for readInternal()
to return boolean
, and to return true
if it's OK to continue or false
if we hit EOF, and then have the two places which call it return -1
if readInternal()
returns false.
i.e.
The two calls to readInternal();
both become if (!readInternal()) return -1; // EOF reading header
The method declaration private void readInternal() throws IOException {
becomes private boolean readInternal() throws IOException {
Inside the readInternal
method itself, both lines saying if (i < 0) return; // EOF
become if (i < 0) return false; // EOF
...and we add a return true;
to the end of the readInternal
method.
...and, ideally, add a unit-test to verify that both the class' read
methods will return -1
if the InputStream ends during the 8 bytes of header even if InputStream.read(...,...,0) returns 0 and not -1.
Yes, you are right. Recreated patch according to your recomendations. Currently it works well, will wait for some time (to test it in real work for a long period).
--- src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-02-21 12:05:27.519660828 +0000
+++ src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-03-09 13:29:41.080690191 +0000
@@ -30,26 +30,26 @@
@Override
public int read() throws IOException {
- readInternal();
+ if (!readInternal()) return -1; // EOF reading header
next--;
return multiplexed.read();
}
@Override
public int read(byte[] b, int off, int len) throws IOException {
- readInternal();
+ if (!readInternal()) return -1; // EOF reading header
int byteRead = multiplexed.read(b, off, Math.min(next, len));
- next -= byteRead;
+ if (byteRead > 0) next -= byteRead;
return byteRead;
}
- private void readInternal() throws IOException {
+ private boolean readInternal() throws IOException {
while (next == 0) {
byte[] header = new byte[8];
int todo = 8;
while (todo > 0) {
int i = multiplexed.read(header, 8-todo, todo);
- if (i < 0) return; // EOF
+ if (i < 0) return false; // EOF
todo -= i;
}
int size = ((header[4] & 0xff) << 24) + ((header[5] & 0xff) << 16) + ((header[6] & 0xff) << 8) + (header[7] & 0xff);
@@ -78,6 +78,7 @@
throw new IOException("Unexpected application/vnd.docker.raw-stream frame type " + Arrays.toString(header));
}
}
+ return true;
}
}
Would be great to have a pull request for this along with an explanation of how this change has been tested.
Jenkins and plugins versions report
Environment
```text Jenkins: 2.375.2 OS: Linux - 4.15.0-202-generic --- ace-editor:1.1 ant:481.v7b_09e538fcca antisamy-markup-formatter:155.v795fb_8702324 apache-httpcomponents-client-4-api:4.5.13-138.v4e7d9a_7b_a_e61 artifactory:3.18.0 authentication-tokens:1.4 authorize-project:1.4.0 basic-branch-build-strategies:71.vc1421f89888e bootstrap4-api:4.6.0-5 bootstrap5-api:5.2.1-3 bouncycastle-api:2.27 branch-api:2.1071.v1a_188a_562481 build-discarder:139.v05696a_7fe240 build-timeout:1.28 caffeine-api:2.9.3-65.v6a_47d0f4d1fe checks-api:1.8.1 cloudbees-folder:6.800.v71307ca_b_986b collapsing-console-sections:1.8.0 command-launcher:90.v669d7ccb_7c31 commons-lang3-api:3.12.0-36.vd97de6465d5b_ commons-text-api:1.10.0-27.vb_fa_3896786a_7 config-file-provider:3.11.1 configuration-as-code:1569.vb_72405b_80249 configuration-as-code-groovy:1.1 credentials:1214.v1de940103927 credentials-binding:523.vd859a_4b_122e6 data-tables-api:1.12.1-4 disable-github-multibranch-status:1.2 display-url-api:2.3.7 docker-commons:1.21 docker-java-api:3.2.13-37.vf3411c9828b9 docker-plugin:1.3.0 docker-workflow:563.vd5d2e5c4007f dtkit-api:3.0.2 durable-task:504.vb10d1ae5ba2f echarts-api:5.4.0-1 email-ext:2.93 extensible-choice-parameter:1.8.0 extra-columns:1.25 font-awesome-api:6.2.1-1 git:5.0.0 git-client:4.1.0 git-parameter:0.9.18 git-server:99.va_0826a_b_cdfa_d github:1.36.1 github-api:1.303-400.v35c2d8258028 github-branch-source:1701.v00cc8184df93 github-scm-trait-notification-context:1.1 gradle:2.2 h2-api:1.4.199 hidden-parameter:0.0.5 http_request:1.16 inline-pipeline:1.0.2 instance-identity:142.v04572ca_5b_265 ionicons-api:31.v4757b_6987003 ivy:2.4 jackson2-api:2.14.1-313.v504cdd45c18b jakarta-activation-api:2.0.1-2 jakarta-mail-api:2.0.1-2 javadoc:226.v71211feb_e7e9 javax-activation-api:1.2.0-5 javax-mail-api:1.6.2-8 jaxb:2.3.7-1 jdk-tool:63.v62d2fd4b_4793 jjwt-api:0.11.5-77.v646c772fddb_0 jnr-posix-api:3.1.16-1 job-dsl:1.81 jobConfigHistory:1198.v4d5736c2308c jquery:1.12.4-1 jquery-detached:1.2.1 jquery3-api:3.6.1-2 jsch:0.1.55.61.va_e9ee26616e7 junit:1166.va_436e268e972 junit-attachments:131.vef2117770555 junit-realtime-test-reporter:90.v7dc2c937f9a_8 ldap:659.v8ca_b_a_fe79fa_d lockable-resources:1113.vf66761f17f1e mailer:438.v02c7f0a_12fa_4 mapdb-api:1.0.9-28.vf251ce40855d matrix-auth:3.1.6 matrix-project:785.v06b_7f47b_c631 maven-plugin:3.20 mina-sshd-api-common:2.9.2-50.va_0e1f42659a_a mina-sshd-api-core:2.9.2-50.va_0e1f42659a_a momentjs:1.1.1 next-executions:1.0.15 nodejs:1.5.1 okhttp-api:4.9.3-108.v0feda04578cf pam-auth:1.10 pipeline-build-step:2.18 pipeline-config-history:1.6 pipeline-github-lib:38.v445716ea_edda_ pipeline-graph-analysis:202.va_d268e64deb_3 pipeline-groovy-lib:629.vb_5627b_ee2104 pipeline-input-step:466.v6d0a_5df34f81 pipeline-maven:1257.v89e586d3c58c pipeline-milestone-step:111.v449306f708b_7 pipeline-model-api:2.2118.v31fd5b_9944b_5 pipeline-model-definition:2.2118.v31fd5b_9944b_5 pipeline-model-extensions:2.2118.v31fd5b_9944b_5 pipeline-rest-api:2.30 pipeline-stage-step:305.ve96d0205c1c6 pipeline-stage-tags-metadata:2.2118.v31fd5b_9944b_5 pipeline-stage-view:2.30 pipeline-utility-steps:2.15.0 plain-credentials:143.v1b_df8b_d3b_e48 plugin-util-api:2.20.0 popper-api:1.16.1-3 popper2-api:2.11.6-2 resource-disposer:0.20 scm-api:631.v9143df5b_e4a_a script-security:1229.v4880b_b_e905a_6 simple-theme-plugin:136.v23a_15f86c53d slack:631.v40deea_40323b snakeyaml-api:1.33-90.v80dcb_3814d35 ssh-agent:295.v9ca_a_1c7cc3a_a_ ssh-credentials:305.v8f4381501156 ssh-slaves:2.854.v7fd446b_337c9 sshd:3.275.v9e17c10f2571 structs:324.va_f5d6774f3a_d subversion:2.17.0 test-results-analyzer:0.4.0 timestamper:1.21 token-macro:321.vd7cc1f2a_52c8 trilead-api:2.84.v72119de229b_7 uno-choice:2.6.4 variant:59.vf075fe829ccb view-job-filters:2.3 workflow-aggregator:590.v6a_d052e5a_a_b_5 workflow-api:1208.v0cc7c6e0da_9e workflow-basic-steps:994.vd57e3ca_46d24 workflow-cps:3606.v0b_d8b_e512dcf workflow-durable-task-step:1223.v7f1a_98a_8863e workflow-job:1268.v6eb_e2ee1a_85a workflow-multibranch:716.vc692a_e52371b_ workflow-scm-step:400.v6b_89a_1317c9a_ workflow-step-api:639.v6eca_cd8c04a_a_ workflow-support:839.v35e2736cfd5c ws-cleanup:0.44 xunit:3.1.2 ```What Operating System are you using (both controller, and any agents involved in the problem)?
Controller
Linux Ubuntu 18.04.6 LTS Docker version 20.10.12, build e91ed57 Jenkins running as docker container from image
jenkins/jenkins:lts-alpine
Agent
Linux Ubuntu 20.04.4 LTS Docker version 20.10.14, build a224086 Connection "Jenkins -> agent docker" is made via TCP with SSL cert auth
Reproduction steps
Just run jobs (pipeline) and problem will happen after some time.
Expected Results
No high CPU load.
Actual Results
Endless constant 100% CPU load on Jenkins node until Jenkins is restarted. Although Jenkins continues to work and execute jobs.
Anything else?
Not sure, but it looks like the problem is here - https://github.com/jenkinsci/docker-plugin/blob/master/src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java#L50 Infinite loop ?
The problem occurs several times a day.
Jenkins log:
VisualVM thread dump: