Closed tstromberg closed 5 years ago
Researching the last example:
I tried searching audit logs, but our records didn't go back far enough:
# ausearch -ul jenkins -ts 11/29/2018 14:46:00 -te 11/29/2018 15:01:00
<no matches>
I updated /etc/auditd/auditd.conf for a greater retention time:
max_log_file = 100
num_logs = 100
Investigating /Linux_Integration_Tests_KVM/builds/2806:
UTC start: 2018-12-14 12:56:37 UTC end: 2018-12-14 13:17:01.463 PST end: 2018-12-14 04:57:01 Took 20 min on GCP - Linux
System definitely rebooted at this point:
reboot system boot 4.9.0-8-amd64 Fri Dec 14 12:17 - 13:17 (00:59)
Running "sudo ausearch -ts 12/14/2018 13:16:59 -te 12/14/2018 13:17:01 | grep EXECVE" I see:
type=EXECVE msg=audit(1544793421.239:2789): argc=2 a0="/bin/sh" a1="/etc/cron.hourly/cleanup-and-reboot" type=EXECVE msg=audit(1544793421.255:2790): argc=3 a0="grep" a1="-v" a2="java" type=EXECVE msg=audit(1544793421.255:2791): argc=2 a0="pidof" a1="java" type=EXECVE msg=audit(1544793421.275:2792): argc=2 a0="pstree" a1="1222" type=EXECVE msg=audit(1544793421.299:2793): argc=2 a0="logger" a1=636C65616E75702D616E642D7265626F6F742072756E6E696E67 type=EXECVE msg=audit(1544793421.303:2794): argc=2 a0="wall" a1=636C65616E75702D616E642D7265626F6F742072756E6E696E67 type=EXECVE msg=audit(1544793421.319:2795): argc=2 a0="killall" a1="java"
Here's the crazy part. It looks like the script finished:
05:17:00 ++ echo '>> /home/jenkins/minikube-integration/linux-amd64-kvm2-master-2573-5d910e8937962f21785ce3acc0e6b9a2d5da9114 completed at Fri Dec 14 13:17:00 UTC 2018' 05:17:00 >> /home/jenkins/minikube-integration/linux-amd64-kvm2-master-2573-5d910e8937962f21785ce3acc0e6b9a2d5da9114 completed at Fri Dec 14 13:17:00 UTC 2018 05:17:00 ++ [[ master != \m\a\s\t\e\r ]] 05:17:00 ++ exit 0 05:17:01 FATAL: java.io.IOException: Unexpected termination of the channel
I suspect we're seeing a race condition where the script just barely finishes before we reboot the machine: before anything has been uploaded.
FATAL: command execution failed java.nio.channels.ClosedChannelException
Please help me how to fix it
Sometimes Jenkins runs fail in the middle of a test with:
Examples from the last 3 days:
The files that are on the hour make me suspicious that this is due to a reboot or other cleanup process.