timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-19145] Builds fail because of "slave went offline during the build" #10476

Open timja opened 11 years ago

timja commented 11 years ago

Taking a slave offline during a build create a FileNotFoundException version 1.525

Looks like the node went offline during the build. Check the slave log for the details.FATAL: /Users/Shared/Jenkins/Home/logs/slaves/null/slave.log (No such file or directory)
java.io.FileNotFoundException: /Users/Shared/Jenkins/Home/logs/slaves/null/slave.log (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:216)
at org.kohsuke.stapler.framework.io.LargeText$FileSession.(LargeText.java:397)
at org.kohsuke.stapler.framework.io.LargeText$2.open(LargeText.java:120)
at org.kohsuke.stapler.framework.io.LargeText.writeLogTo(LargeText.java:210)
at hudson.console.AnnotatedLargeText.writeHtmlTo(AnnotatedLargeText.java:159)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:599)
at hudson.model.Run.execute(Run.java:1593)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:247)


Originally reported by gandalf, imported from: Builds fail because of "slave went offline during the build"
  • status: Reopened
  • priority: Major
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 11 years ago

schander:

I'm have this same problem with v 1.530

Looks like the node went offline during the build. Check the slave log for the details.FATAL: /jenkins-home/logs/slaves/mactestsvr01/slave.log (No such file or directory)
java.io.FileNotFoundException: /jenkins-home/logs/slaves/mactestsvr01/slave.log (No such file or directory)
    at java.io.RandomAccessFile.open(Native Method)
    at java.io.RandomAccessFile.(RandomAccessFile.java:216)
    at org.kohsuke.stapler.framework.io.LargeText$FileSession.(LargeText.java:397)
    at org.kohsuke.stapler.framework.io.LargeText$2.open(LargeText.java:120)
    at org.kohsuke.stapler.framework.io.LargeText.writeLogTo(LargeText.java:210)
    at hudson.console.AnnotatedLargeText.writeHtmlTo(AnnotatedLargeText.java:159)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:580)
    at hudson.model.Run.execute(Run.java:1603)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
    at hudson.model.ResourceController.execute(ResourceController.java:88)
    at hudson.model.Executor.run(Executor.java:246)

However when I look for the log file I find that there is a log name slave.log.[1-9] at that log location.

timja commented 10 years ago

elatt:

I see the same on 1.539

timja commented 10 years ago

danielbeck:

These look like different problems, as the original report shows the slave to be named 'null'. Probably because there's no hudson.model.Computer for the hudson.model.Node, but needs further investigation.

Do these issues still occur on more recent Jenkins versions (no older than eight weeks)?

timja commented 10 years ago

danielbeck:

No response to comment asking for updated information in three weeks, so resolving as Incomplete.

timja commented 9 years ago

mgrybyk:

Jenkins ver. 1.609

Slave went offline during the build
09:06:21 ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@290bfa[name=STMSQADV-TDB801]
09:06:21 at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
09:06:21 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
09:06:21 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
09:06:21 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
09:06:21 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
09:06:21 at java.util.concurrent.FutureTask.run(Unknown Source)
09:06:21 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
09:06:21 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
09:06:21 at java.lang.Thread.run(Unknown Source)
09:06:21 Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
09:06:21 at sun.nio.ch.SocketDispatcher.read0(Native Method)
09:06:21 at sun.nio.ch.SocketDispatcher.read(Unknown Source)
09:06:21 at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
09:06:21 at sun.nio.ch.IOUtil.read(Unknown Source)
09:06:21 at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
09:06:21 at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
09:06:21 at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
09:06:21 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
09:06:21 ... 7 more

timja commented 9 years ago

pnadczuk:

Jenkins 1.622

12:04:02 Slave went offline during the build
12:04:02 ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
12:04:02 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
12:04:02 Caused by: java.io.EOFException
12:04:02 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
12:04:02 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
12:04:02 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
12:04:02 at java.io.ObjectInputStream.(ObjectInputStream.java:299)
12:04:02 at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:40)
12:04:02 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
12:04:02 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

timja commented 9 years ago

thunderbird:

Seen in Jenkins ver. 1.609.1

13:47:30 Slave went offline during the build
13:47:31 ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@6ed75b03[name=jenkins-win-slave1]
13:47:31    at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
13:47:31    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
13:47:31    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
13:47:31    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
13:47:31    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
13:47:31    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
13:47:31    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
13:47:31    at java.lang.Thread.run(Thread.java:745)
13:47:31 Caused by: java.io.IOException: Connection reset by peer
13:47:31    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
13:47:31    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
13:47:31    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
13:47:31    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
13:47:31    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
13:47:31    at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
13:47:31    at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
13:47:31    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
13:47:31    ... 6 more
13:47:31 
13:47:31 Build step 'Invoke Ant' marked build as failure
13:47:31 ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception
13:47:31 hudson.AbortException: no workspace for 1010-unit-tests #364
13:47:31    at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:72)
13:47:31    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
13:47:31    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761)
13:47:31    at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:721)
13:47:31    at hudson.model.Build$BuildExecution.post2(Build.java:183)
13:47:31    at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:670)
13:47:31    at hudson.model.Run.execute(Run.java:1766)
13:47:31    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
13:47:31    at hudson.model.ResourceController.execute(ResourceController.java:98)
13:47:31    at hudson.model.Executor.run(Executor.java:374)

Jenkins master stacktrace:

Sep 02, 2015 1:47:29 PM org.jenkinsci.remoting.nio.NioChannelHub run
WARNING: Communication problem
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Sep 02, 2015 1:47:31 PM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed
WARNING: NioChannelHub keys=2 gen=13177477: Computer.threadPoolForRemoting [#1] for + jenkins-win-slave1 terminated
java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@6ed75b03[name=jenkins-win-slave1]
at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
... 6 more

Jenkins Agent log:

NLP agent connected from /10.1.10.45
<===[JENKINS REMOTING CAPACITY]===>^@^@^@Slave.jar version: 2.50
This is a Windows slave
Slave successfully connected and online
ERROR: Connection terminated
^[[8mha:AAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=^[[0mjava.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@6ed75b03[name=jenkins-win-slave1]
at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:628)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
... 6 more
timja commented 8 years ago

jgruzewski:

Jenkins version: 1.656
Amazon EC2 plugin version: 1.31

Master:

Slave went offline during the build
ERROR: Connection was broken: java.io.IOException: Unexpected termination of the channel
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
    at java.io.ObjectInputStream.(ObjectInputStream.java:299)
    at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

Build step 'Execute shell' marked build as failure
FATAL: channel is already closed
hudson.remoting.ChannelClosedException: channel is already closed
    at hudson.remoting.Channel.send(Channel.java:578)
    at hudson.remoting.Request.call(Request.java:130)
    at hudson.remoting.Channel.call(Channel.java:780)
    at hudson.Launcher$RemoteLauncher.kill(Launcher.java:953)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:540)
    at hudson.model.Run.execute(Run.java:1738)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:410)
Caused by: java.io.IOException: Unexpected termination of the channel
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
    at java.io.ObjectInputStream.(ObjectInputStream.java:299)
    at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Finished: FAILURE

Slave:

ERROR: Connection terminated
java.io.IOException: Unexpected termination of the channel
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
    at java.io.ObjectInputStream.(ObjectInputStream.java:299)
    at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
timja commented 7 years ago

orkenstein:

So any progress on this? I'm getting exact the same problem with Mac Pro node connected via ssh.