timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-32950] Jenkins slave resets connection during or just after artifacts download. #7838

Open timja opened 8 years ago

timja commented 8 years ago

In jenkins I have several build jobs with some artifact dependencies. First project builds just fine both on linux and windows, but the second one (requiring artifacts from previous project) fails during artifact download.

Slave log from slave perspective:

Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: Windows2008R2_64bit
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener 
INFO: Jenkins agent is running in headless mode.
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://10.102.22.50:8080/]
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to 10.102.22.50:50226
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Feb 15, 2016 5:12:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Feb 15, 2016 5:13:54 AM hudson.remoting.SynchronousCommandTransport$ReaderThread
 run
SEVERE: I/O error in channel channel
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStr
eam.java:90)
at hudson.remoting.ChunkedInputStream.read(ChunkedInputStream.java:46)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.
java:97)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTrans
port.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(Abs
tractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(Synchron
ousCommandTransport.java:48)

Feb 15, 2016 5:13:54 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://10.102.22.50:8080/]
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to 10.102.22.50:50226
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Feb 15, 2016 5:14:04 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected

Slave log from master perspective:

JNLP agent connected from /10.102.22.50
<===[JENKINS REMOTING CAPACITY]===>   Slave.jar version: 2.53.2
This is a Windows slave
Slave successfully connected and online
ERROR: Connection terminated
[8mha:AAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=[0mjava.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@2ce66ffa[name=Windows2008R2_64bit]
    at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
    at sun.nio.ch.SocketDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(Unknown Source)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
    at sun.nio.ch.IOUtil.read(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
    at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
    at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
    ... 6 more

Log from jenkins job:

Building remotely on Windows2008R2_64bit (Win64e) in workspace C:\jenkins\workspace\##\buildNode\Win64e
 > C:\Program Files\Git\bin\git.exe rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > C:\Program Files\Git\bin\git.exe config remote.origin.url #### # timeout=10
Fetching upstream changes from ######
 > C:\Program Files\Git\bin\git.exe --version # timeout=10
using GIT_SSH to set credentials 
 > C:\Program Files\Git\bin\git.exe -c core.askpass=true fetch --tags --progress ssh://####### +refs/heads/*:refs/remotes/origin/*
Checking out Revision ##(refs/remotes/origin/master)
 > C:\Program Files\Git\bin\git.exe config core.sparsecheckout # timeout=10
 > C:\Program Files\Git\bin\git.exe checkout -f ##
 > C:\Program Files\Git\bin\git.exe rev-list ### timeout=10
Run condition [Execution node ] enabling prebuild for step [Execute shell]
Run condition [Execution node ] enabling prebuild for step [Execute Windows batch command]
Slave went offline during the build
ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@41241c12[name=Windows2008R2_64bit]
    at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
    at sun.nio.ch.SocketDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(Unknown Source)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
    at sun.nio.ch.IOUtil.read(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
    at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
    at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
    ... 6 more

Build step 'Copy artifacts from another project' marked build as failure
ERROR: Step 'Scan for compiler warnings' failed: no workspace for ##/buildNode=Win64e #57
ERROR: Step 'Archive the artifacts' failed: no workspace for ##/buildNode=Win64e #57
Finished: FAILURE

You may notice slave reconnects, but the build is frozen and it has to be killed in jenkins UI. It hapens 19/20 cases (from very rare time to time it finishes without problems).
The problem happens only on Windows slave. It's not happening on any of linux slaves.
I tried:

Checked on different Windows machine (Windows 2012) everything seems to work just fine. Some Hyper-V issue? I'll make more tests.


Originally reported by 321kami, imported from: Jenkins slave resets connection during or just after artifacts download.
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 8 years ago

borisivan:

I have seen the same problem for years. I typically see this at the end of a build when doing a maven site site:deploy. My guess is that something about the rapid transfer of small files to build the resultant website is preventing some sort of keepalive from working, or tripping some kind of integrity check.

Same stack trace on windows slave, etc. Always:

May 04, 2016 10:17:36 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel channel
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
...
...