kubernetes / minikube

Run Kubernetes locally
https://minikube.sigs.k8s.io/
Apache License 2.0
28.77k stars 4.81k forks source link

jenkins: FATAL: command execution failed: java.io.EOFException / Unexpected termination of the channel #3390

Closed tstromberg closed 5 years ago

tstromberg commented 5 years ago

Sometimes Jenkins runs fail in the middle of a test with:

FATAL: command execution failed                                                                                                                     
java.io.EOFException                                                                                                                                
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)                                                         
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)                                                    
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)                                                                   
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)                                                                             
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)                                                                  
        at hudson.remoting.Command.readFrom(Command.java:140)                                                                                       
        at hudson.remoting.Command.readFrom(Command.java:126)                                                                                       
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)                  
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)                                        
Caused: java.io.IOException: Unexpected termination of the channel                                                                                  
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)                                        
Caused: java.io.IOException: Backing channel 'GCP - Linux' is disconnected.                                                                         
        at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:214)                                                  
        at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)                                                         
        at com.sun.proxy.$Proxy76.isAlive(Unknown Source)                                                                                           
        at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1143)                                                                      
        at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1135)                                                                         
        at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)                                                                        
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)                                                                     
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)                                                                      
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)                                                                        
        at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)                                                        
        at hudson.model.Build$BuildExecution.build(Build.java:206)                                                                                  
        at hudson.model.Build$BuildExecution.doRun(Build.java:163)                                                                                  
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)                                                            
        at hudson.model.Run.execute(Run.java:1819)                                                                                                  
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)                                                                                  
        at hudson.model.ResourceController.execute(ResourceController.java:97)                                                                      
        at hudson.model.Executor.run(Executor.java:429)                                                                                             
FATAL: Unable to delete script file /tmp/jenkins1293438029758179178.sh

Examples from the last 3 days:

Nov 27 15:00 ./Linux_Integration_Tests_KVM/builds/2430/log
Nov 27 15:07 ./Linux_Integration_Tests_none/builds/1593/log
Nov 28 15:00 ./Linux_Integration_Tests_KVM/builds/2458/log
Nov 28 18:14 ./OSX_Integration_Tests_hyperkit/builds/1379/log
Nov 29 15:00 ./Linux_Integration_Tests_KVM/builds/2482/log

The files that are on the hour make me suspicious that this is due to a reboot or other cleanup process.

tstromberg commented 5 years ago

Researching the last example:

I tried searching audit logs, but our records didn't go back far enough:

# ausearch -ul jenkins -ts 11/29/2018 14:46:00 -te 11/29/2018 15:01:00 
<no matches>

I updated /etc/auditd/auditd.conf for a greater retention time:

max_log_file = 100
num_logs = 100
tstromberg commented 5 years ago

Investigating /Linux_Integration_Tests_KVM/builds/2806:

UTC start: 2018-12-14 12:56:37 UTC end: 2018-12-14 13:17:01.463 PST end: 2018-12-14 04:57:01 Took 20 min on GCP - Linux

System definitely rebooted at this point:

reboot system boot 4.9.0-8-amd64 Fri Dec 14 12:17 - 13:17 (00:59)

Running "sudo ausearch -ts 12/14/2018 13:16:59 -te 12/14/2018 13:17:01 | grep EXECVE" I see:

type=EXECVE msg=audit(1544793421.239:2789): argc=2 a0="/bin/sh" a1="/etc/cron.hourly/cleanup-and-reboot" type=EXECVE msg=audit(1544793421.255:2790): argc=3 a0="grep" a1="-v" a2="java" type=EXECVE msg=audit(1544793421.255:2791): argc=2 a0="pidof" a1="java" type=EXECVE msg=audit(1544793421.275:2792): argc=2 a0="pstree" a1="1222" type=EXECVE msg=audit(1544793421.299:2793): argc=2 a0="logger" a1=636C65616E75702D616E642D7265626F6F742072756E6E696E67 type=EXECVE msg=audit(1544793421.303:2794): argc=2 a0="wall" a1=636C65616E75702D616E642D7265626F6F742072756E6E696E67 type=EXECVE msg=audit(1544793421.319:2795): argc=2 a0="killall" a1="java"

Here's the crazy part. It looks like the script finished:

05:17:00 ++ echo '>> /home/jenkins/minikube-integration/linux-amd64-kvm2-master-2573-5d910e8937962f21785ce3acc0e6b9a2d5da9114 completed at Fri Dec 14 13:17:00 UTC 2018' 05:17:00 >> /home/jenkins/minikube-integration/linux-amd64-kvm2-master-2573-5d910e8937962f21785ce3acc0e6b9a2d5da9114 completed at Fri Dec 14 13:17:00 UTC 2018 05:17:00 ++ [[ master != \m\a\s\t\e\r ]] 05:17:00 ++ exit 0 05:17:01 FATAL: java.io.IOException: Unexpected termination of the channel

I suspect we're seeing a race condition where the script just barely finishes before we reboot the machine: before anything has been uploaded.

rahamthulla commented 4 years ago

FATAL: command execution failed java.nio.channels.ClosedChannelException

Please help me how to fix it