kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

`docker rm` multi-rm can hang up runtime (with sb lock held) #406

Closed grahamwhaley closed 5 years ago

grahamwhaley commented 6 years ago

Description of problem

Running the soak test over at https://github.com/kata-containers/tests/pull/414, regularly hangs up in the docker rm phase - that is when it does a docker rm x y z of upto 110 containers (but we have seen this fail with 20 containers).

Expected result

The test is expected to complete.

Actual result

The test hangs up in the rm phase

Meta details

Mega system details here. More interesting info I'll post in a new comment on the thread.

Running kata-collect-data.sh version 1.0.0 (commit ca9f7abba96e5c4db734673b9e7d870076d715e2) at 2018-06-18.17:39:28.741713727+0100.


Runtime is /usr/local/bin/kata-runtime.

kata-env

Output of "/usr/local/bin/kata-runtime kata-env":

[Meta]
  Version = "1.0.12"

[Runtime]
  Debug = false
  [Runtime.Version]
    Semver = "1.0.0"
    Commit = "ca9f7abba96e5c4db734673b9e7d870076d715e2"
    OCI = "1.0.1"
  [Runtime.Config]
    Path = "/usr/share/defaults/kata-containers/configuration.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.7.0, Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  Msize9p = 8192
  Debug = false

[Image]
  Path = "/usr/share/kata-containers/kata-containers-2018-06-13-10:50:04.393273528+0100-3375e73"

[Kernel]
  Path = "/usr/share/kata-containers/kata-vmlinuz-4.14.22.container"
  Parameters = ""

[Initrd]
  Path = ""

[Proxy]
  Type = "kataProxy"
  Version = "kata-proxy version 1.0.0-a69326b63802952b14203ea9c1533d4edb8c1d64"
  Path = "/usr/libexec/kata-containers/kata-proxy"
  Debug = true

[Shim]
  Type = "kataShim"
  Version = "kata-shim version 1.0.0-087a5371680f069d45baed8544a09b4e6353c06e"
  Path = "/usr/libexec/kata-containers/kata-shim"
  Debug = true

[Agent]
  Type = "kata"

[Host]
  Kernel = "4.4.0-104-generic"
  Architecture = "amd64"
  VMContainerCapable = true
  [Host.Distro]
    Name = "Ubuntu"
    Version = "16.04"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz"

Runtime config files

Runtime default config files

/etc/kata-containers/configuration.toml
/usr/share/defaults/kata-containers/configuration.toml

Runtime config file contents

Config file /etc/kata-containers/configuration.toml not found Output of "cat "/usr/share/defaults/kata-containers/configuration.toml"":

# Copyright (c) 2017-2018 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "cli/config/configuration.toml.in"
# XXX: Project:
# XXX:   Name: Kata Containers
# XXX:   Type: kata

[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/kata-containers/vmlinuz.container"
#initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
image = "/usr/share/kata-containers/kata-containers.img"
machine_type = "pc"

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per SB/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1

# Default maximum number of vCPUs per SB/VM:
# unspecified or == 0             --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when
# the actual number of physical cores is greater than it.
# WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU
# the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs
# can be added to a SB/VM, but the memory footprint will be big. Another example, with
# `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of
# vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable,
# unless you know what are you doing.
default_maxvcpus = 0

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per SB/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for SB/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons.
# This flag prevents the block device from being passed to the hypervisor,
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or
# virtio-blk.
block_device_driver = "virtio-scsi"

# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
#
enable_iothreads = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
#
# Default false
enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true

# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = 8192

[proxy.kata]
path = "/usr/libexec/kata-containers/kata-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
enable_debug = true

[shim.kata]
path = "/usr/libexec/kata-containers/kata-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
enable_debug = true

[agent.kata]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - bridged
#     Uses a linux bridge to interconnect the container interface to
#     the VM. Works for most cases except macvlan and ipvlan.
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
internetworking_model="macvtap"

Image details

unknown

Initrd details

No initrd


Logfiles

Runtime logs

Recent runtime problems found in system journal:

time="2018-06-18T17:10:56.190461935+01:00" level=error msg="#\t0x737603\tcompress/flate.init.0+0x3b3\t\t\t/usr/local/go/src/compress/flate/huffman_bit_writer.go:615" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190477653+01:00" level=error msg="#\t0x42dd19\truntime.main+0x1c9\t\t\t\t/usr/local/go/src/runtime/proc.go:186" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190492503+01:00" level=error command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190506243+01:00" level=error msg="1: 32 [1: 32] @ 0x7d6b99 0x7d6995 0x7d7dfb 0x83f448 0x8646fa 0x93940c 0x9a8615 0x9f5714 0x42dd1a 0x4592d1" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190525841+01:00" level=error msg="#\t0x7d6b98\tgithub.com/kata-containers/runtime/vendor/golang.org/x/net/http2/hpack.addDecoderNode+0x1d8\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/golang.org/x/net/http2/hpack/huffman.go:144" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190545511+01:00" level=error msg="#\t0x7d6994\tgithub.com/kata-containers/runtime/vendor/golang.org/x/net/http2/hpack.init.0+0x74\t\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/golang.org/x/net/http2/hpack/huffman.go:127" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.19057613+01:00" level=error msg="#\t0x42dd19\truntime.main+0x1c9\t\t\t\t\t\t\t\t\t\t/usr/local/go/src/runtime/proc.go:186" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190595734+01:00" level=error command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190609496+01:00" level=error command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190622939+01:00" level=error msg="# runtime.MemStats" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190637399+01:00" level=error msg="# Alloc = 2582096" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190653663+01:00" level=error msg="# TotalAlloc = 2582096" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190668471+01:00" level=error msg="# Sys = 6686968" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190686203+01:00" level=error msg="# Lookups = 42" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.19070225+01:00" level=error msg="# Mallocs = 13431" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190717594+01:00" level=error msg="# Frees = 719" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.19073167+01:00" level=error msg="# HeapAlloc = 2582096" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190745914+01:00" level=error msg="# HeapSys = 3670016" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190762066+01:00" level=error msg="# HeapIdle = 114688" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190777616+01:00" level=error msg="# HeapInuse = 3555328" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.19079393+01:00" level=error msg="# HeapReleased = 0" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190808633+01:00" level=error msg="# HeapObjects = 12712" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190822726+01:00" level=error msg="# Stack = 524288 / 524288" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190836582+01:00" level=error msg="# MSpan = 33896 / 49152" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190852839+01:00" level=error msg="# MCache = 3472 / 16384" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190867304+01:00" level=error msg="# BuckHashSys = 1445178" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190882743+01:00" level=error msg="# GCSys = 202752" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190900298+01:00" level=error msg="# OtherSys = 779198" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190918323+01:00" level=error msg="# NextGC = 4473924" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190934322+01:00" level=error msg="# LastGC = 0" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190953603+01:00" level=error msg="# PauseNs = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.190980379+01:00" level=error msg="# PauseEnd = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191006987+01:00" level=error msg="# NumGC = 0" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191023079+01:00" level=error msg="# NumForcedGC = 0" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191040185+01:00" level=error msg="# GCCPUFraction = 0" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191054046+01:00" level=error msg="# DebugGC = false" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191068397+01:00" level=error msg="--- mutex:" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191082512+01:00" level=error msg="cycles/second=1799978944" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191097022+01:00" level=error msg="sampling period=0" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191112665+01:00" level=error msg="threadcreate profile: total 11" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191129608+01:00" level=error msg="10 @" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191147723+01:00" level=error msg="#\t0x0" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191163576+01:00" level=error command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.19117744+01:00" level=error msg="1 @ 0x43154c 0x431c59 0x431f1f 0x42dcf1 0x4592d1" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191194916+01:00" level=error msg="#\t0x43154b\truntime.allocm+0x15b\t\t\t/usr/local/go/src/runtime/proc.go:1516" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191210985+01:00" level=error msg="#\t0x431c58\truntime.newm+0x38\t\t\t/usr/local/go/src/runtime/proc.go:1830" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191226912+01:00" level=error msg="#\t0x431f1e\truntime.startTemplateThread+0x4e\t/usr/local/go/src/runtime/proc.go:1891" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191242655+01:00" level=error msg="#\t0x42dcf0\truntime.main+0x1a0\t\t\t/usr/local/go/src/runtime/proc.go:181" command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191268824+01:00" level=error command=kill name=kata-runtime pid=8352 source=runtime
time="2018-06-18T17:10:56.191284019+01:00" level=error command=kill name=kata-runtime pid=8352 source=runtime

Proxy logs

Recent proxy problems found in system journal:

time="2018-06-18T16:53:02.981477809+01:00" level=fatal msg="accept unix /run/vc/sbs/0395b0ab4371e9535f16a5f329289961ef758ac48e2c1717e386f9ea8cb75972/proxy.sock: use of closed network connection" name=kata-proxy pid=22604 source=proxy
time="2018-06-18T16:53:03.213552294+01:00" level=fatal msg="accept unix /run/vc/sbs/80e50ccd713514898a0228274827b134ef1d089b6b32c7ac9861ea4fda3bb93c/proxy.sock: use of closed network connection" name=kata-proxy pid=22185 source=proxy
time="2018-06-18T16:53:03.36075397+01:00" level=fatal msg="accept unix /run/vc/sbs/58b5ed8fce2c97a865c36077a2fe868f5c45f5b870596566042b08ef85445a53/proxy.sock: use of closed network connection" name=kata-proxy pid=22393 source=proxy
time="2018-06-18T16:53:03.781096721+01:00" level=fatal msg="accept unix /run/vc/sbs/2a6ac896ef972db5ac2b007396fd086290aca5a57db8da205e7ea408d38af42c/proxy.sock: use of closed network connection" name=kata-proxy pid=21978 source=proxy
time="2018-06-18T16:53:04.011345139+01:00" level=fatal msg="accept unix /run/vc/sbs/1e49a4e7820aacb04040ae4ebd4fb2f35c10ba488d7892dd9038c124c562ac6c/proxy.sock: use of closed network connection" name=kata-proxy pid=21770 source=proxy
time="2018-06-18T16:53:04.375031223+01:00" level=fatal msg="accept unix /run/vc/sbs/2bf6a133c9f5873f270f40dfbafdc1808c477a2e0cd9552e3f22295257d35f82/proxy.sock: use of closed network connection" name=kata-proxy pid=21560 source=proxy
time="2018-06-18T16:53:05.317467921+01:00" level=fatal msg="accept unix /run/vc/sbs/4e7cb9fca63ad5eb36b97b25dd84c9f8fdb588a2ebe4ccb08ab34b6a1e37d4a5/proxy.sock: use of closed network connection" name=kata-proxy pid=20947 source=proxy
time="2018-06-18T16:53:05.593123359+01:00" level=fatal msg="accept unix /run/vc/sbs/bb1b99d7a80d4f6ee09180cf7dff027ffeaca9b8e0d1dcaf5281db2928af4694/proxy.sock: use of closed network connection" name=kata-proxy pid=20735 source=proxy
time="2018-06-18T16:53:05.749935285+01:00" level=fatal msg="accept unix /run/vc/sbs/66e9c07028a35c87a3ad1a7877cff923713febc7e1563fac8f088735b01dfee5/proxy.sock: use of closed network connection" name=kata-proxy pid=20319 source=proxy
time="2018-06-18T16:53:05.808755112+01:00" level=fatal msg="accept unix /run/vc/sbs/0d255ec7d6449fba6003f3d4dc9d049bf6310083fa07ff6f5022c3e7a3562550/proxy.sock: use of closed network connection" name=kata-proxy pid=20529 source=proxy
time="2018-06-18T16:53:06.10669189+01:00" level=fatal msg="accept unix /run/vc/sbs/f251227eb24f3b2425ab010463dcf9e8bed826ba4926dd57a4f6caca997e4168/proxy.sock: use of closed network connection" name=kata-proxy pid=20110 source=proxy
time="2018-06-18T16:53:06.63855247+01:00" level=fatal msg="accept unix /run/vc/sbs/b6b3ad56eb82f22ef210f94ce7a13e5353666f7baf7704f09f8dc1cc50da5f7b/proxy.sock: use of closed network connection" name=kata-proxy pid=19899 source=proxy
time="2018-06-18T16:53:07.061790687+01:00" level=fatal msg="accept unix /run/vc/sbs/928382f2904b850e8afa0c431e0648ab78271844da3d0de14352fd31e3b9cc1b/proxy.sock: use of closed network connection" name=kata-proxy pid=19684 source=proxy
time="2018-06-18T16:53:07.353249406+01:00" level=fatal msg="accept unix /run/vc/sbs/20e8cf852e9222c9dd9f6b92e3376a5e1198757628e83d4b03ce43c460dde931/proxy.sock: use of closed network connection" name=kata-proxy pid=19477 source=proxy
time="2018-06-18T16:53:07.655124911+01:00" level=fatal msg="accept unix /run/vc/sbs/9f8ff228edfa9dc6f9ffc34edcaa9df0d746417dad40da76ea3422d492a802d2/proxy.sock: use of closed network connection" name=kata-proxy pid=19048 source=proxy
time="2018-06-18T16:53:07.679507719+01:00" level=fatal msg="accept unix /run/vc/sbs/7c63dd9bd14e0f0a111af8f72996f6a8184e62bad3bd2035478482b355d801ed/proxy.sock: use of closed network connection" name=kata-proxy pid=19270 source=proxy
time="2018-06-18T16:53:08.021186288+01:00" level=fatal msg="accept unix /run/vc/sbs/f69adcf57bf1cbee6db0521fe41ca4b885ae15a0cb128a1ad74618329f49f3ed/proxy.sock: use of closed network connection" name=kata-proxy pid=18429 source=proxy
time="2018-06-18T16:53:08.219995589+01:00" level=fatal msg="accept unix /run/vc/sbs/bd14d15ddd010b35de3b58cbb3b3a0971c0f80ea11467f402810cf340909503b/proxy.sock: use of closed network connection" name=kata-proxy pid=18636 source=proxy
time="2018-06-18T16:53:08.220785822+01:00" level=fatal msg="accept unix /run/vc/sbs/a2ce1a84eb7a65cb09cdae5dc348467b52ae02ea4d837e9487d935cb979b30e6/proxy.sock: use of closed network connection" name=kata-proxy pid=18222 source=proxy
time="2018-06-18T16:53:08.639370339+01:00" level=fatal msg="accept unix /run/vc/sbs/a0015157c0ae66dd9ed622c10c1857816dbf59e6d1579fe91d128c95321b3a63/proxy.sock: use of closed network connection" name=kata-proxy pid=18016 source=proxy
time="2018-06-18T16:53:08.651424449+01:00" level=fatal msg="accept unix /run/vc/sbs/cac006c0d0c6d3080d737cd7b5e4668bde6de7e076f4173e1d446bd3f83a4d51/proxy.sock: use of closed network connection" name=kata-proxy pid=17804 source=proxy
time="2018-06-18T16:53:09.073181748+01:00" level=fatal msg="accept unix /run/vc/sbs/3aa8720a429429fdae2ff05f65d4531d970002f4a01be6579852439386eff637/proxy.sock: use of closed network connection" name=kata-proxy pid=17185 source=proxy
time="2018-06-18T16:53:09.226079153+01:00" level=fatal msg="accept unix /run/vc/sbs/62ceaec59c63e99b7cf6e061801a5167ad5f590d1cebd725789ca332132016eb/proxy.sock: use of closed network connection" name=kata-proxy pid=17599 source=proxy
time="2018-06-18T16:53:09.361715129+01:00" level=fatal msg="accept unix /run/vc/sbs/986b6ee9f9ebd21864e9fd2ad11af6beeb0a7f013d31d7430affca5cb435e6b7/proxy.sock: use of closed network connection" name=kata-proxy pid=17391 source=proxy
time="2018-06-18T16:53:09.440136165+01:00" level=fatal msg="accept unix /run/vc/sbs/3bf3bb2c37e6d97583a0c671020f4f24b4fcc931cc6c551c94111ce7638b2ed4/proxy.sock: use of closed network connection" name=kata-proxy pid=16553 source=proxy
time="2018-06-18T16:53:09.642822691+01:00" level=fatal msg="accept unix /run/vc/sbs/979ac17c21b3cd510f3ef9a3e65282ece6dbe74d4dc5c55354c9f028eab6acfe/proxy.sock: use of closed network connection" name=kata-proxy pid=16764 source=proxy
time="2018-06-18T16:53:09.782278406+01:00" level=fatal msg="accept unix /run/vc/sbs/a661ba7d84c085f1ecc48112c55915967eb543cc4b83896ea8dcc3481a95fede/proxy.sock: use of closed network connection" name=kata-proxy pid=16344 source=proxy
time="2018-06-18T16:53:10.352609256+01:00" level=fatal msg="accept unix /run/vc/sbs/124a2e7a82fd178a6a9dedcacff3c15bf4768a9fff766f83dbf1b434ab0ccc8f/proxy.sock: use of closed network connection" name=kata-proxy pid=15715 source=proxy
time="2018-06-18T16:53:10.374361724+01:00" level=fatal msg="accept unix /run/vc/sbs/43323465298c9748bf869b679b007d0c7833a08bda7c3ea4a51b2251f168745e/proxy.sock: use of closed network connection" name=kata-proxy pid=16134 source=proxy
time="2018-06-18T16:53:10.387322415+01:00" level=fatal msg="accept unix /run/vc/sbs/4b60e9f28db0770612126f584a65abb6f8c937181c6a0d6abe36eb96379d3b52/proxy.sock: use of closed network connection" name=kata-proxy pid=15926 source=proxy
time="2018-06-18T16:53:10.702314114+01:00" level=fatal msg="accept unix /run/vc/sbs/846fe6919e5d3cac15831981656cd72ba5bb010ea12b8f2d9a78884fcf7775f3/proxy.sock: use of closed network connection" name=kata-proxy pid=15295 source=proxy
time="2018-06-18T16:53:10.912527258+01:00" level=fatal msg="accept unix /run/vc/sbs/82676bb70f2f4a2c2372369682ad2419c937d50f3da83e75d47f9feb76e0e592/proxy.sock: use of closed network connection" name=kata-proxy pid=15500 source=proxy
time="2018-06-18T16:53:11.018130845+01:00" level=fatal msg="accept unix /run/vc/sbs/5c4fc170be2a0d6ef57f8214bc16a4ad757aa5999b8da0a8199c25e29c426faf/proxy.sock: use of closed network connection" name=kata-proxy pid=14874 source=proxy
time="2018-06-18T16:53:11.443962597+01:00" level=fatal msg="accept unix /run/vc/sbs/0bdc300c261065a7a552881fa1b6af8ed8c77da7ef3dd0544b41f5ab2b3c2ed5/proxy.sock: use of closed network connection" name=kata-proxy pid=14452 source=proxy
time="2018-06-18T16:53:11.632200051+01:00" level=fatal msg="accept unix /run/vc/sbs/3bc0f0bfe2e1c54ba3fbcc07c155257be8a9fcd163bf058846513318972f61eb/proxy.sock: use of closed network connection" name=kata-proxy pid=14243 source=proxy
time="2018-06-18T16:53:12.02002466+01:00" level=fatal msg="accept unix /run/vc/sbs/15e962649ab2d6bdb60d2b44fefe9ee551f7d4179712447c9427d0030d9d0f04/proxy.sock: use of closed network connection" name=kata-proxy pid=14036 source=proxy
time="2018-06-18T16:53:12.024432071+01:00" level=fatal msg="accept unix /run/vc/sbs/bf0716b4d3f982c33028a0d9b3e4cf6dbc92bdf75ecf0e9a766fc8be5653f602/proxy.sock: use of closed network connection" name=kata-proxy pid=12531 source=proxy
time="2018-06-18T16:53:12.076729023+01:00" level=fatal msg="accept unix /run/vc/sbs/89763d2636ef7274d0e43bdfee05b668708fc6507552a312352b2b2744fce91b/proxy.sock: use of closed network connection" name=kata-proxy pid=13818 source=proxy
time="2018-06-18T16:53:12.104913715+01:00" level=fatal msg="accept unix /run/vc/sbs/9741460cbfc140de26ac86d97a0c84b104e799c7aae7d19594388f8fbc319129/proxy.sock: use of closed network connection" name=kata-proxy pid=13601 source=proxy
time="2018-06-18T16:53:12.174105447+01:00" level=fatal msg="accept unix /run/vc/sbs/338bd0ff4719617697b888cb9e7237e2c2b5ca874f0a6cd67745e7e2881cfbc8/proxy.sock: use of closed network connection" name=kata-proxy pid=12757 source=proxy
time="2018-06-18T16:53:12.208021868+01:00" level=fatal msg="accept unix /run/vc/sbs/2da43c30b1f46b088938b6c0d15aca3cf12e3e7e593f58578d7c4a3068976b1b/proxy.sock: use of closed network connection" name=kata-proxy pid=13388 source=proxy
time="2018-06-18T16:53:12.312872716+01:00" level=fatal msg="accept unix /run/vc/sbs/997c13662ad18f80750ea9d3e74317d8023bc35ae2247b726a465d81127d301e/proxy.sock: use of closed network connection" name=kata-proxy pid=12102 source=proxy
time="2018-06-18T16:53:12.34886955+01:00" level=fatal msg="accept unix /run/vc/sbs/b792a1679ffa3086d4f1d3f1ebe4a5c9c55dba259adab4351505543974c76c92/proxy.sock: use of closed network connection" name=kata-proxy pid=13175 source=proxy
time="2018-06-18T16:53:12.569774939+01:00" level=fatal msg="accept unix /run/vc/sbs/7b601771cc8d1080bf4c63c3dbc9c6b23aadd6523103832d5b3a06479e645dd5/proxy.sock: use of closed network connection" name=kata-proxy pid=12316 source=proxy
time="2018-06-18T16:53:12.767895521+01:00" level=fatal msg="accept unix /run/vc/sbs/99341c71aacc46b7dd1fff127720af227788c7c0b748f2fd9b70e48f45515382/proxy.sock: use of closed network connection" name=kata-proxy pid=11898 source=proxy
time="2018-06-18T16:53:12.90198661+01:00" level=fatal msg="accept unix /run/vc/sbs/b968b635a8d075ad993ea1739d3388a94a1c5b7c1cb83b8a9ba9a4ee1b2f4469/proxy.sock: use of closed network connection" name=kata-proxy pid=11682 source=proxy
time="2018-06-18T16:53:12.976385769+01:00" level=fatal msg="accept unix /run/vc/sbs/ca3935526cddd6048527a483e2f7fcde6b86d46cc6c6921214f2f7889ec6fc83/proxy.sock: use of closed network connection" name=kata-proxy pid=11266 source=proxy
time="2018-06-18T16:53:13.103904498+01:00" level=fatal msg="accept unix /run/vc/sbs/7644686d18d70eea62c0a720ec276422dd21ef8f4809e4f394a927039077a051/proxy.sock: use of closed network connection" name=kata-proxy pid=11472 source=proxy
time="2018-06-18T16:53:13.111081028+01:00" level=fatal msg="accept unix /run/vc/sbs/3b5436038a2b4437dc260a5ba9ffe158c403b868314c669d7da0f503b4707a8c/proxy.sock: use of closed network connection" name=kata-proxy pid=10847 source=proxy
time="2018-06-18T16:53:13.146722163+01:00" level=fatal msg="accept unix /run/vc/sbs/ede019be83be5b7b35243dd71ed1e9cf2ae60ff0b0dddd3197665c6fb2c6acc5/proxy.sock: use of closed network connection" name=kata-proxy pid=11052 source=proxy

Shim logs

Recent shim problems found in system journal:

time="2018-06-18T16:52:54.641885211+01:00" level=info msg="copy stdout failed" container=118f5f5e08d42a9c107f34366613fe0d78fcbcdcf79ca6c2adb46ab9fec27a41 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=118f5f5e08d42a9c107f34366613fe0d78fcbcdcf79ca6c2adb46ab9fec27a41 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.665771935+01:00" level=info msg="copy stdout failed" container=c987e900d45d03aeeed0710035c593577cd035f853e022baa0d3264f7466f25a error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=c987e900d45d03aeeed0710035c593577cd035f853e022baa0d3264f7466f25a name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.666533235+01:00" level=info msg="copy stdout failed" container=4b3700cc872274bcfed53260b43e43cf41fdfdeb1cbe1a6671aaad2e3d956936 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=4b3700cc872274bcfed53260b43e43cf41fdfdeb1cbe1a6671aaad2e3d956936 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.667218288+01:00" level=info msg="copy stdout failed" container=3dc8416008c51c8f48f1c1cbae011cbcfc9a687534be2306b3d59b2c568bbd83 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=3dc8416008c51c8f48f1c1cbae011cbcfc9a687534be2306b3d59b2c568bbd83 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.668400481+01:00" level=info msg="copy stdout failed" container=cbcee7f9e5d6e1505424138c3fb42708061fb4f7579d778db7e3a83844913683 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=cbcee7f9e5d6e1505424138c3fb42708061fb4f7579d778db7e3a83844913683 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.684980144+01:00" level=info msg="copy stdout failed" container=1abc9e08050ffff5964da4feb0f0000ea7f9ff02dc82929f23eb7e57fdf13d65 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=1abc9e08050ffff5964da4feb0f0000ea7f9ff02dc82929f23eb7e57fdf13d65 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.6883765+01:00" level=info msg="copy stdout failed" container=773a5eb7d8ef062c2bf6994076891c51be2383dd20318e3e60420bd012cbf7fc error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=773a5eb7d8ef062c2bf6994076891c51be2383dd20318e3e60420bd012cbf7fc name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.691102024+01:00" level=info msg="copy stdout failed" container=9d466a5da22f6d76c415307a7ab02b6da221828a75df5129e656390cd7299dcd error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=9d466a5da22f6d76c415307a7ab02b6da221828a75df5129e656390cd7299dcd name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.692190363+01:00" level=info msg="copy stdout failed" container=4ca26b046750884795939d482438755f8782ac37a54c68e394c2ee34e2badeb5 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=4ca26b046750884795939d482438755f8782ac37a54c68e394c2ee34e2badeb5 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.83357955+01:00" level=info msg="copy stdout failed" container=62a1689aedb4018da001ecfa67c728a930f70417dcf78689d3365d683119086c error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=62a1689aedb4018da001ecfa67c728a930f70417dcf78689d3365d683119086c name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.846369913+01:00" level=info msg="copy stdout failed" container=732ae406ca30704f7911c7378323ad8aa7ec103d4839fcd3c72e385f50b0bddb error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=732ae406ca30704f7911c7378323ad8aa7ec103d4839fcd3c72e385f50b0bddb name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.873584183+01:00" level=info msg="copy stdout failed" container=722625190bd65d47a43eb572129963bb77b46e2eaefc6dbd8fa798845d389855 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=722625190bd65d47a43eb572129963bb77b46e2eaefc6dbd8fa798845d389855 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.929131507+01:00" level=info msg="copy stdout failed" container=2d38e83ebecfde0bab17ebeae36097bb14b59565ffaa8ceaa241a3f320de6642 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=2d38e83ebecfde0bab17ebeae36097bb14b59565ffaa8ceaa241a3f320de6642 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:54.930523375+01:00" level=info msg="copy stdout failed" container=77988b26802a01041d4bb10cd35d415135c1a995fe047501603b883df2e91072 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=77988b26802a01041d4bb10cd35d415135c1a995fe047501603b883df2e91072 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.116227459+01:00" level=info msg="copy stdout failed" container=70c68987d8c81f8cb83c3c1ea6761227e6f7b044abd3104837ab4fcc6ea39731 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=70c68987d8c81f8cb83c3c1ea6761227e6f7b044abd3104837ab4fcc6ea39731 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.153744988+01:00" level=info msg="copy stdout failed" container=7b84e0d9c4c5b08ed54b98056a437f43a2071926288b9819b9fd7cdaa9713fb5 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=7b84e0d9c4c5b08ed54b98056a437f43a2071926288b9819b9fd7cdaa9713fb5 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.189165171+01:00" level=info msg="copy stdout failed" container=bfff68c1a400ae31ea6fcb8b0fecba65457d0e51d661ac01930a2c7ccf2826e5 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=bfff68c1a400ae31ea6fcb8b0fecba65457d0e51d661ac01930a2c7ccf2826e5 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.19847512+01:00" level=info msg="copy stdout failed" container=4100c4e5573a2f83956e4921eb2195156a13336135cd06b141fc08c46d3e1893 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=4100c4e5573a2f83956e4921eb2195156a13336135cd06b141fc08c46d3e1893 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.233701112+01:00" level=info msg="copy stdout failed" container=1555d7ad382d82eb874bf3442f4cae00e364df33eb245a18db9378ce0cf9f5ec error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=1555d7ad382d82eb874bf3442f4cae00e364df33eb245a18db9378ce0cf9f5ec name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.341299836+01:00" level=info msg="copy stdout failed" container=b2dab39dac397ceefb5847fd2dcf1b9e41a76ead984ff8457848c55142991dfe error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=b2dab39dac397ceefb5847fd2dcf1b9e41a76ead984ff8457848c55142991dfe name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.423994542+01:00" level=info msg="copy stdout failed" container=0b74402c9b3e8bd49d30208ab787e2c54f694e365e18201143ae82b174c541d4 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=0b74402c9b3e8bd49d30208ab787e2c54f694e365e18201143ae82b174c541d4 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.440901404+01:00" level=info msg="copy stdout failed" container=d8f4717755d1202861af8952f01888893c27131834365a7d215156b35dec5d6c error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=d8f4717755d1202861af8952f01888893c27131834365a7d215156b35dec5d6c name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.58599587+01:00" level=info msg="copy stdout failed" container=3091b2b8a7638060175a3ac771a1293c26d73ec97891fdde208de51348a77331 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=3091b2b8a7638060175a3ac771a1293c26d73ec97891fdde208de51348a77331 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.657211111+01:00" level=info msg="copy stdout failed" container=343a81c280f1cb14ecc5be06b7d676e542ee0b0169338f40d4edb874096d3f86 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=343a81c280f1cb14ecc5be06b7d676e542ee0b0169338f40d4edb874096d3f86 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.665979023+01:00" level=info msg="copy stdout failed" container=ba32097b5f36d58738e0031d05db0a9180e3eeadbb56d6a2284d5fb2fdc5876b error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=ba32097b5f36d58738e0031d05db0a9180e3eeadbb56d6a2284d5fb2fdc5876b name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.669522288+01:00" level=info msg="copy stdout failed" container=372a7ace93ab8a189b435d2123e7f305e43cc7dca367e0e1ae6dabe4618bc293 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=372a7ace93ab8a189b435d2123e7f305e43cc7dca367e0e1ae6dabe4618bc293 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.67027179+01:00" level=info msg="copy stdout failed" container=52b0cf9d5f2460301117317ecde69de99b97f0c27719d29551016e61a9899198 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=52b0cf9d5f2460301117317ecde69de99b97f0c27719d29551016e61a9899198 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.6781792+01:00" level=info msg="copy stdout failed" container=4ef1a8d3cef8e9a719d3911a1cfb99dcf647733138b4aef219295818861f716f error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=4ef1a8d3cef8e9a719d3911a1cfb99dcf647733138b4aef219295818861f716f name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.789124733+01:00" level=info msg="copy stdout failed" container=9e0d259fe4255d2b54acb3aa272bbb8106cbc660df9edd040079d2cbe81fd0be error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=9e0d259fe4255d2b54acb3aa272bbb8106cbc660df9edd040079d2cbe81fd0be name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.819458057+01:00" level=info msg="copy stdout failed" container=b5e5b73f941f842b03fc4f62d9668a1f0cba9b1f21b81756e99d5f3182762358 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=b5e5b73f941f842b03fc4f62d9668a1f0cba9b1f21b81756e99d5f3182762358 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.825274798+01:00" level=info msg="copy stdout failed" container=3bbdcb36c5a77e941e93ba83b281906ed46488a19136f55c0538f57fb2b6ea9c error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=3bbdcb36c5a77e941e93ba83b281906ed46488a19136f55c0538f57fb2b6ea9c name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.857932784+01:00" level=info msg="copy stdout failed" container=fdd8fabbcf816306f3e13f6a1964ba1a92398d46e0455f685d4f071930368a2c error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=fdd8fabbcf816306f3e13f6a1964ba1a92398d46e0455f685d4f071930368a2c name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.901930614+01:00" level=info msg="copy stdout failed" container=680356be91b9b9186e60ced0d7af1e0c26270bbfa7e8775e7736042097badef5 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=680356be91b9b9186e60ced0d7af1e0c26270bbfa7e8775e7736042097badef5 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.905360923+01:00" level=info msg="copy stdout failed" container=1d19243068f3bb4eb183ea0cd60b7cdfa910075cf8dbd1d4c4fe0cf762300200 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=1d19243068f3bb4eb183ea0cd60b7cdfa910075cf8dbd1d4c4fe0cf762300200 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:55.974262789+01:00" level=info msg="copy stdout failed" container=b0dbb35b8e51b0928c14246410a8d81ab8f1f59430ce2e57bed00448da85ae1c error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=b0dbb35b8e51b0928c14246410a8d81ab8f1f59430ce2e57bed00448da85ae1c name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.091857574+01:00" level=info msg="copy stdout failed" container=4c7cc1c9da53545f906933d5766f6fb511c93eb0b11e69c2b2a46101ac5a55f3 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=4c7cc1c9da53545f906933d5766f6fb511c93eb0b11e69c2b2a46101ac5a55f3 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.178160232+01:00" level=info msg="copy stdout failed" container=c4b15aea3abd7dc9b6387913a750f318895a54072f5ec38cce79da52a81451be error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=c4b15aea3abd7dc9b6387913a750f318895a54072f5ec38cce79da52a81451be name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.249530323+01:00" level=info msg="copy stdout failed" container=a6bcf340c52ce02eb4e14bd1ccfb87115ca3e805e61df949019ceb29f528fc84 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=a6bcf340c52ce02eb4e14bd1ccfb87115ca3e805e61df949019ceb29f528fc84 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.252454689+01:00" level=info msg="copy stdout failed" container=98eb12f8fffc04ab2539ef4f64068c7e8b29089eb2d85608ef536de1bea1db95 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=98eb12f8fffc04ab2539ef4f64068c7e8b29089eb2d85608ef536de1bea1db95 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.321884292+01:00" level=info msg="copy stdout failed" container=ab4546c4623f7521a11189e3c3c71e5e0d5a54e2ebf602c3258202fd5b72f569 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=ab4546c4623f7521a11189e3c3c71e5e0d5a54e2ebf602c3258202fd5b72f569 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.359108069+01:00" level=info msg="copy stdout failed" container=9418b27565fed5224c91f37c689902d83b1516e00d93d6ba0c63cd793d9fd678 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=9418b27565fed5224c91f37c689902d83b1516e00d93d6ba0c63cd793d9fd678 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.359465146+01:00" level=info msg="copy stdout failed" container=35e43e7d2507b8960d1a8398d08399881f599ac0b9553893a5785ed55e829bd7 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=35e43e7d2507b8960d1a8398d08399881f599ac0b9553893a5785ed55e829bd7 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.525347214+01:00" level=info msg="copy stdout failed" container=4abe3225486ebd99bf42c43c4868002fcdaa8e4114bba6d64211e54d86a62b70 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=4abe3225486ebd99bf42c43c4868002fcdaa8e4114bba6d64211e54d86a62b70 name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.532649058+01:00" level=info msg="copy stdout failed" container=0f1d07b8855a0fa960d564861b41f3f1fd009ba8b6b59a93eb66992d96afe4ea error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=0f1d07b8855a0fa960d564861b41f3f1fd009ba8b6b59a93eb66992d96afe4ea name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.595326072+01:00" level=info msg="copy stdout failed" container=3e3fa35b72f676e7c8cc7ddd052b700a19a1714916ac8453f23acb9519d28f2a error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=3e3fa35b72f676e7c8cc7ddd052b700a19a1714916ac8453f23acb9519d28f2a name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.66370013+01:00" level=info msg="copy stdout failed" container=ab672417b79007acc738ffe6f2156931ffac419338767ccf1007b88e036c601f error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=ab672417b79007acc738ffe6f2156931ffac419338767ccf1007b88e036c601f name=kata-shim pid=1 source=shim
time="2018-06-18T16:52:56.738008208+01:00" level=info msg="copy stdout failed" container=76d69e6cb4c0fceeedd397d1585ff36f7440b45a4b5939d07853f73902a27a78 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=76d69e6cb4c0fceeedd397d1585ff36f7440b45a4b5939d07853f73902a27a78 name=kata-shim pid=1 source=shim
time="2018-06-18T16:53:02.744549834+01:00" level=info msg="copy stdout failed" container=80e50ccd713514898a0228274827b134ef1d089b6b32c7ac9861ea4fda3bb93c error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=80e50ccd713514898a0228274827b134ef1d089b6b32c7ac9861ea4fda3bb93c name=kata-shim pid=1 source=shim
time="2018-06-18T16:53:12.128061087+01:00" level=info msg="copy stdout failed" container=997c13662ad18f80750ea9d3e74317d8023bc35ae2247b726a465d81127d301e error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=997c13662ad18f80750ea9d3e74317d8023bc35ae2247b726a465d81127d301e name=kata-shim pid=1 source=shim
time="2018-06-18T16:53:12.740915005+01:00" level=info msg="copy stdout failed" container=ca3935526cddd6048527a483e2f7fcde6b86d46cc6c6921214f2f7889ec6fc83 error="rpc error: code = Unknown desc = read /dev/ptmx: input/output error" exec-id=ca3935526cddd6048527a483e2f7fcde6b86d46cc6c6921214f2f7889ec6fc83 name=kata-shim pid=1 source=shim

Container manager details

Have docker

Docker

Output of "docker version":

Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:17:40 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:16:13 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of "docker info":

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 121
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: cc-runtime kata-runtime runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-104-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 31.31GiB
Name: bignuc
ID: ROOX:AUAH:KW7J:BADM:E3PW:ORDN:KL42:FKXQ:77LD:VGKO:ZFY4:OG64
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 36
 Goroutines: 47
 System Time: 2018-06-18T17:39:32.659508579+01:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Output of "systemctl show docker":

Type=notify
Restart=on-failure
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=infinity
TimeoutStopUSec=1min 30s
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestamp=Mon 2018-06-18 10:15:23 BST
WatchdogTimestampMonotonic=10387398
FailureAction=none
PermissionsStartOnly=no
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=1131
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ExecMainStartTimestamp=Mon 2018-06-18 10:15:21 BST
ExecMainStartTimestampMonotonic=8921882
ExecMainExitTimestampMonotonic=0
ExecMainPID=1131
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd -D --add-runtime cc-runtime=/usr/local/bin/cc-runtime --add-runtime kata-runtime=/usr/local/bin/kata-runtime --default-runtime=runc ; ignore_errors=no ; start_time=[Mon 2018-06-18 10:15:21 BST] ; stop_time=[n/a] ; pid=1131 ; code=(null) ; status=0/0 }
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/docker.service
MemoryCurrent=311611392
CPUUsageNSec=689927147686
TasksCurrent=253
Delegate=yes
CPUAccounting=no
CPUShares=18446744073709551615
StartupCPUShares=18446744073709551615
CPUQuotaPerSecUSec=infinity
BlockIOAccounting=no
BlockIOWeight=18446744073709551615
StartupBlockIOWeight=18446744073709551615
MemoryAccounting=no
MemoryLimit=18446744073709551615
DevicePolicy=auto
TasksAccounting=no
TasksMax=18446744073709551615
UMask=0022
LimitCPU=18446744073709551615
LimitCPUSoft=18446744073709551615
LimitFSIZE=18446744073709551615
LimitFSIZESoft=18446744073709551615
LimitDATA=18446744073709551615
LimitDATASoft=18446744073709551615
LimitSTACK=18446744073709551615
LimitSTACKSoft=8388608
LimitCORE=18446744073709551615
LimitCORESoft=18446744073709551615
LimitRSS=18446744073709551615
LimitRSSSoft=18446744073709551615
LimitNOFILE=1048576
LimitNOFILESoft=1048576
LimitAS=18446744073709551615
LimitASSoft=18446744073709551615
LimitNPROC=18446744073709551615
LimitNPROCSoft=18446744073709551615
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=18446744073709551615
LimitLOCKSSoft=18446744073709551615
LimitSIGPENDING=128104
LimitSIGPENDINGSoft=128104
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=18446744073709551615
LimitRTTIMESoft=18446744073709551615
OOMScoreAdjust=0
Nice=0
IOScheduling=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
SecureBits=0
CapabilityBoundingSet=18446744073709551615
AmbientCapabilities=0
MountFlags=0
PrivateTmp=no
PrivateNetwork=no
PrivateDevices=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
RuntimeDirectoryMode=0755
KillMode=process
KillSignal=15
SendSIGKILL=yes
SendSIGHUP=no
Id=docker.service
Names=docker.service
Requires=docker.socket system.slice sysinit.target
Wants=network-online.target
WantedBy=multi-user.target
ConsistsOf=docker.socket
Conflicts=shutdown.target
Before=shutdown.target multi-user.target
After=sysinit.target system.slice systemd-journald.socket docker.socket firewalld.service basic.target network-online.target
TriggeredBy=docker.socket
Documentation=https://docs.docker.com
Description=Docker Application Container Engine
LoadState=loaded
ActiveState=active
SubState=running
FragmentPath=/lib/systemd/system/docker.service
DropInPaths=/etc/systemd/system/docker.service.d/clear-containers.conf
UnitFileState=enabled
UnitFilePreset=enabled
StateChangeTimestamp=Mon 2018-06-18 10:15:23 BST
StateChangeTimestampMonotonic=10387399
InactiveExitTimestamp=Mon 2018-06-18 10:15:21 BST
InactiveExitTimestampMonotonic=8921900
ActiveEnterTimestamp=Mon 2018-06-18 10:15:23 BST
ActiveEnterTimestampMonotonic=10387399
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Mon 2018-06-18 10:15:21 BST
ConditionTimestampMonotonic=8919902
AssertTimestamp=Mon 2018-06-18 10:15:21 BST
AssertTimestampMonotonic=8919902
Transient=no
StartLimitInterval=60000000
StartLimitBurst=3
StartLimitAction=none

No kubectl


Packages

Have dpkg Output of "dpkg -l|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-containers-image|linux-container|qemu-)"":

ii  linux-container                     4.14.22-86                                  amd64        linux kernel optimised for container-like workloads.
ii  qemu-block-extra:amd64              1:2.5+dfsg-5ubuntu10.24                     amd64        extra block backend modules for qemu-system and qemu-utils
ii  qemu-kvm                            1:2.5+dfsg-5ubuntu10.24                     amd64        QEMU Full virtualization
ii  qemu-lite-bin                       741f430a960b5b67745670e8270db91aeb083c5f-30 amd64        bin components for the qemu-lite package.
ii  qemu-lite-data                      741f430a960b5b67745670e8270db91aeb083c5f-30 amd64        data components for the qemu-lite package.
ii  qemu-system-common                  1:2.5+dfsg-5ubuntu10.24                     amd64        QEMU full system emulation binaries (common files)
ii  qemu-system-x86                     1:2.5+dfsg-5ubuntu10.24                     amd64        QEMU full system emulation binaries (x86)
ii  qemu-utils                          1:2.5+dfsg-5ubuntu10.24                     amd64        QEMU utilities

Have rpm Output of "rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-containers-image|linux-container|qemu-)"":


grahamwhaley commented 6 years ago

What I found was I had two runtimes active (ps -ef | fgrep kata-runtime), and looking at their open files (cd /proc/<pid>/fd; ls -la) (as I had a strong suspicion we would have a lock file open issue - one of the symptoms is that once we have 'gone wrong', doing a kata-runtime list will hang up if somebody has their lock held for writing...), I found that both the runtime processes had their lockfiles held open. An lsof in their /run/vc/sb/* dirs confirmed that.

After some debug (initially with gdb and the golang plugin, but then @jodh-intel pointed me at the kill -SIGUSR1 stackdump (when debug is enabled), what I found what looks like a killCommand to be stuck with its lock open trying to grpc over to the agent. The thing is, if we do a ps -ef | fgrep kata, we find that both the containers have a runtime/proxy/shim, but no QEMUs are running.

Hence, my current conclusion is that the QEMU has 'gone away' for some reason, but our grpc code does not, or has not, handled that case, and is sat waiting for a reply. Here is the relevant stackdump from the go thread in question. Look for api.go in there.

level=error command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/cli/main.go:391 +0x3e" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="main.main()" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/cli/main.go:383 +0x4f" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="main.createRuntime()" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/cli/main.go:340 +0x1bd" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="main.createRuntimeApp(0xc4200b0000, 0xa, 0xa, 0xc42019c7e0, 0xc420071f70)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/github.com/urfave/cli/app.go:255 +0x6a0" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/github.com/urfave/cli.(*App).Run(0xc4200c71e0, 0xc4200b0000, 0xa, 0xa, 0x0, 0x0)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/github.com/urfave/cli/command.go:210 +0xa36" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/github.com/urfave/cli.Command.Run(0xb866d9, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0xba9243, 0x32, 0x0, ...)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/github.com/urfave/cli/app.go:490 +0xc8" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/github.com/urfave/cli.HandleAction(0xa81280, 0xbbb178, 0xc42010b8c0, 0xc4200cc700, 0x0)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/cli/kill.go:50 +0xdc" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="main.glob..func6(0xc42010b8c0, 0x0, 0xc42010b8c0)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/cli/kill.go:111 +0x2ba" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="main.kill(0xc4200eab80, 0x40, 0x7ffe5de64f33, 0x1, 0x9c1a00, 0xb8664d, 0x4)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/virtcontainers/implementation.go:116 +0x5f" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/virtcontainers.(*VCImpl).KillContainer(0x11cba80, 0xc4201b81cd, 0x40, 0xc4200eab80, 0x40, 0x9, 0x0, 0x0, 0x0)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/virtcontainers/api.go:556 +0x108" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/virtcontainers.KillContainer(0xc4201b81cd, 0x40, 0xc4200eab80, 0x40, 0x9, 0x3030100, 0x0, 0x0)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/virtcontainers/container.go:863 +0x56" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/virtcontainers.(*Container).kill(0xc4200cf400, 0x9, 0x0, 0xc4200cf400, 0x0)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/virtcontainers/container.go:875 +0x174" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/virtcontainers.(*Container).signalProcess(0xc4200cf400, 0xc4200eaf80, 0x40, 0x9, 0xc420263700, 0x18, 0xc420263750)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/virtcontainers/kata_agent.go:969 +0xbc" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/virtcontainers.(*kataAgent).signalProcess(0xc4200cc960, 0xc4200cf400, 0xc4200eaf80, 0x40, 0x9, 0x0, 0x0, 0x0)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/virtcontainers/kata_agent.go:1233 +0x199" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/virtcontainers.(*kataAgent).sendReq(0xc4200cc960, 0xb30740, 0xc420221980, 0x0, 0x0, 0x0, 0x0)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/virtcontainers/kata_agent.go:1179 +0x97" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/virtcontainers.(*kataAgent).installReqFunc.func8(0xc17ce0, 0xc4200a6080, 0xb30740, 0xc420221980, 0x0, 0x0, 0x0, 0x412328, 0x30, 0xafa680, ...)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/github.com/kata-containers/agent/protocols/grpc/agent.pb.go:1675 +0xd2" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/github.com/kata-containers/agent/protocols/grpc.(*agentServiceClient).SignalProcess(0xc42000e020, 0xc17ce0, 0xc4200a6080, 0xc420221980, 0x0, 0x0, 0x0, 0xc420221980, 0xc1fa20, 0xc4202ac000)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/google.golang.org/grpc/call.go:158 +0xc1" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/google.golang.org/grpc.Invoke(0xc17ce0, 0xc4200a6080, 0xb9bc1f, 0x20, 0xb30740, 0xc420221980, 0xb34a60, 0x11cbd50, 0xc42025a780, 0x0, ...)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/google.golang.org/grpc/call.go:150 +0x19c" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc42025a780, 0xc17ce0, 0xc4200a6080, 0xb9bc1f, 0x20, 0xb30740, 0xc420221980, 0xb34a60, 0x11cbd50, 0x0, ...)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/google.golang.org/grpc/call.go:301 +0x9ed" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/google.golang.org/grpc.invoke(0xc17d60, 0xc42028c240, 0xb9bc1f, 0x20, 0xb30740, 0xc420221980, 0xb34a60, 0x11cbd50, 0xc42025a780, 0x0, ...)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/google.golang.org/grpc/call.go:49 +0x9f" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/google.golang.org/grpc.recvResponse(0xc17d60, 0xc42028c240, 0x0, 0x0, 0xc176a0, 0x11cbd50, 0x0, 0x0, 0x0, 0x0, ...)" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="\t/home/gwhaley/gopath/src/github.com/kata-containers/runtime/vendor/google.golang.org/grpc/transport/transport.go:283 +0x128" command=kill name=kata-runtime pid=8352 source=runtime
level=error msg="github.com/kata-containers/runtime/vendor/google.golang.org/grpc/transport.(*Stream).Header(0xc4202b0000, 0xbbadb0, 0xc4202631c0, 0xc1ad60)" command=kill name=kata-runtime pid=8352 source=runtime

/cc @sboeuf . Anybody else we need to pull in here? Tomorrow I'll look into grpc and see if there are any disconnect or timeout things we can tweak?

sboeuf commented 6 years ago

@grahamwhaley thanks for the detailed investigation here. We should definitely look at some appropriate timeouts to be applied here and gracefully fail based on this. Now, that being said, if we could also avoid the failure itself, this would be great. Maybe a race condition related to the stop sequence.

grahamwhaley commented 6 years ago

@sboeuf , absolutely we should figure out what is casing the death of QEMU. Yeah, I suspect a race/timeout - last time iirc I looked at something like this part of the problem was related to rm taking some time to kill off our runtime, and docker having some hard wired internal timeouts (something like 7+4 seconds), at which point it forcefully kills things off - in which case we were half way through a kill or rm when Docker either issues another or forces some process death. I'll see what I can find in the logs.

On the actual hung up with no timeout issue - I think the root culprit is likely the use of this Background context in the sendReq() function that I think ultimately issues the grpc call. I expect we actually want a context.WithTimeout or context.WithDeadline context in there instead. I'm having an experiment with that right now. I also suspect it might not be quite that simple, as we may have to figure out if all calls through that handler are expected to complete 'in a fair time', or if some are meant to be blocking calls. For instance, just below there we have another use of Background() context, but that is polling the stdin/out, so I think that one probably is meant to block.

grahamwhaley commented 6 years ago

Heh heh - daily update. I've been trying to figure out how and when the QEMU dies, and I've not found it yet. I added in this patch whilst testing to check if we could timeout the grpc context:

diff --git a/virtcontainers/kata_agent.go b/virtcontainers/kata_agent.go
index b49efae..868f2ae 100644
--- a/virtcontainers/kata_agent.go
+++ b/virtcontainers/kata_agent.go
@@ -967,6 +967,13 @@ func (k *kataAgent) signalProcess(c *Container, processID string, signal syscall
        }

        _, err := k.sendReq(req)
+
+       if err != nil {
+               k.Logger().WithFields(logrus.Fields{
+                       "container-id": c.id,
+                       }).WithError(err).Error("signalProcess() sendReq failed")
+       }
+
        return err
 }

@@ -1230,7 +1237,16 @@ func (k *kataAgent) sendReq(request interface{}) (interface{}, error) {
                return nil, errors.New("Invalid request type")
        }

-       return handler(context.Background(), request)
+       ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
+       defer cancel()
+       i, err := handler(ctx, request)
+
+       if err != nil {
+               k.Logger().WithError(err).Error("sendReq failed")
+       }
+
+       return i, err
 }

With that, I ran the test until it locked up. Then grabbing the whole journalctl for the last 1h or so, and grepping out the cid of one of the 'stuck' containers, I get (and after remembering to go re-enable debug inside our config file - thanks 'make install' ;-) ):

time="2018-06-19T20:06:39.765624803+01:00" level=error msg="signalProcess() sendReq failed" arch=amd64 container-id=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 error="context deadline exceeded" name=kata-runtime pid=11990 source=virtcontainers subsystem=kata_agent
time="2018-06-19T20:06:34.761909547+01:00" level=info arguments="\"state 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8\"" command=state commit=6f409b9528b66c46814eacf72083e2a54645ee89-dirty name=kata-runtime pid=11990 source=runtime version=1.0.0
time="2018-06-19T20:06:34+01:00" level=error msg="failed to kill init's children" error="OCI runtime killall failed: context deadline exceeded" id=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 namespace=moby path="/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8" pid=13833
time="2018-06-19T20:06:34.743413547+01:00" level=error msg="signalProcess() sendReq failed" arch=amd64 container-id=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 error="context deadline exceeded" name=kata-runtime pid=11935 source=virtcontainers subsystem=kata_agent
time="2018-06-19T20:06:24.741642631+01:00" level=error msg="signalProcess() sendReq failed" arch=amd64 container-id=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 error="context deadline exceeded" name=kata-runtime pid=11935 source=virtcontainers subsystem=kata_agent
time="2018-06-19T20:06:21.735519585+01:00" level=debug msg="FIXME: Got an API for which error does not match any expected type!!!: Could not kill running container 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8, cannot remove - Cannot kill container 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8: unknown error after kill: /usr/local/bin/kata-runtime did not terminate sucessfully: rpc error: code = DeadlineExceeded desc = context deadline exceeded\n: unknown" error_type="*errors.errorString" module=api
time="2018-06-19T20:06:21.735418713+01:00" level=error msg="Handler for DELETE /v1.35/containers/3967be617317 returned error: Could not kill running container 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8, cannot remove - Cannot kill container 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8: unknown error after kill: /usr/local/bin/kata-runtime did not terminate sucessfully: rpc error: code = DeadlineExceeded desc = context deadline exceeded\n: unknown"
time="2018-06-19T20:06:21.735271498+01:00" level=debug msg="FIXME: Got an API for which error does not match any expected type!!!: Could not kill running container 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8, cannot remove - Cannot kill container 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8: unknown error after kill: /usr/local/bin/kata-runtime did not terminate sucessfully: rpc error: code = DeadlineExceeded desc = context deadline exceeded\n: unknown" error_type="*errors.errorString" module=api
time="2018-06-19T20:06:19.738730493+01:00" level=info arguments="\"kill --all 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 9\"" command=kill commit=6f409b9528b66c46814eacf72083e2a54645ee89-dirty name=kata-runtime pid=11935 source=runtime version=1.0.0
time="2018-06-19T20:06:19.734018987+01:00" level=error msg="failed waiting for process" container=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure" exec-id=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 name=kata-shim pid=1 source=shim
time="2018-06-19T20:06:19.733767045+01:00" level=info msg="copy stdout failed" container=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 error="rpc error: code = Unavailable desc = transport is closing" exec-id=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 name=kata-shim pid=1 source=shim
time="2018-06-19T20:06:19.730998206+01:00" level=error msg="signalProcess() sendReq failed" arch=amd64 container-id=3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8 error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" name=kata-runtime pid=9650 source=virtcontainers subsystem=kata_agent
time="2018-06-19T20:01:20.78529512+01:00" level=debug msg="Replacing OCI mount (/etc/hosts) source /var/lib/docker/containers/3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8/hosts with /run/kata-containers/shared/containers/3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8-0722f9f7dc92243c-hosts" arch=amd64 name=kata-runtime pid=13841 source=virtcontainers subsystem=kata_agent
time="2018-06-19T20:01:20.785262587+01:00" level=debug msg="Replacing OCI mount (/etc/hostname) source /var/lib/docker/containers/3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8/hostname with /run/kata-containers/shared/containers/3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8-5dda9d1a96ad4e20-hostname" arch=amd64 name=kata-runtime pid=13841 source=virtcontainers subsystem=kata_agent
time="2018-06-19T20:01:20.785209919+01:00" level=debug msg="Replacing OCI mount (/etc/resolv.conf) source /var/lib/docker/containers/3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8/resolv.conf with /run/kata-containers/shared/containers/3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8-f5383d412b1bf8ec-resolv.conf" arch=amd64 name=kata-runtime pid=13841 source=virtcontainers subsystem=kata_agent
time="2018-06-19T20:01:19.6376833+01:00" level=debug command=create default-kernel-parameters="ip=::::::3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8::off:: init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket" name=kata-runtime pid=13841 source=runtime
time="2018-06-19T20:01:19.636626569+01:00" level=debug msg="converting /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8/config.json" name=kata-runtime pid=13841 source=virtcontainers/oci

Whilst I can see container create and startup, and we can see the timeout fail of the sendReq, what I don't see is any debug/logs of the docker kill command coming through that matches that hang up. odd. Any ideas on that welcome.

I'm starting to run out of ideas a touch on how to figure out what happened to my QEMU... one more 'hack' from @jodh-intel to try tomorrow...

And a final note - adding that context timeout returns an error that seems to upset Docker somewhat ;-)

time="2018-06-19T20:06:21.735519585+01:00" level=debug msg="FIXME: Got an API for which error does not match any expected type!!!: Could not kill running container 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8, cannot remove - Cannot kill container 3967be617317289076fd8c052b3f765b6e33821caa7937e2eca3cab214e637d8: unknown error after kill: /usr/local/bin/kata-runtime did not terminate sucessfully: rpc error: code = DeadlineExceeded desc = context deadline exceeded\n: unknown" error_type="*errors.errorString" module=api

I suspect we are going to need more framework or something along with that simple context timeout...

jodh-intel commented 6 years ago

Hi @grahamwhaley - did the qemu shell hack provide anything new?

grahamwhaley commented 6 years ago

Not got to it yet. First I'm going to hack the metrics to not do a docker rm -f $(docker ps -qa), as that is what hangs up, and is getting us false failures on the metrics CI. I'll add some large comments etc. I'm only really happy changing that as we have this Issue and https://github.com/kata-containers/tests/pull/414 open to cover the broken case. Once I've proven changing the sequence to an iterated pair of docker stop and docker rm, then I'll get back to chasing this a bit.

grahamwhaley commented 6 years ago

Grrr, and testing that, replacing the docker rm -f with an iterated pair of stop/rm ... still hung up! I'll have another shot at tracking down the missing QEMU then, and if that fails, will have to try harder to shore up the metrics for a bit...

grahamwhaley commented 6 years ago

Time for an update. Now we have the journald truncation figured out, I can get some full logs to work with. Doing a run and having 4 containers hang up, looking in the system journal (and a grep -i deadline), I see (oh, the grpc timeouts are from my grpc context timeout patch btw):

time="2018-07-02T11:49:16.217469645+01:00" level=info msg="2018/07/02 10:49:16 [ERR] yamux: keepalive failed: i/o deadline reached\n" name=kata-proxy pid=25683 source=agent
time="2018-07-02T11:49:23.356943573+01:00" level=info msg="2018/07/02 10:49:23 [ERR] yamux: keepalive failed: i/o deadline reached\n" name=kata-proxy pid=22938 source=agent
time="2018-07-02T11:49:37.08974012+01:00" level=info msg="2018/07/02 10:49:37 [ERR] yamux: keepalive failed: i/o deadline reached\n" name=kata-proxy pid=24587 source=agent
time="2018-07-02T11:49:50.249446827+01:00" level=info msg="2018/07/02 10:49:50 [ERR] yamux: keepalive failed: i/o deadline reached\n" name=kata-proxy pid=11639 source=agent
time="2018-07-02T11:51:48.881905754+01:00" level=error msg="sendReq failed" arch=amd64 error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" name=kata-runtime pid=27335 source=virtcontainers subsystem=kata_agent
time="2018-07-02T11:51:48.887067109+01:00" level=error msg="signalProcess() sendReq failed" arch=amd64 container-id=8f2b8cc2f85c93ed30c61abcc5f25c923344c28295a693af2a23bac2a7008408 error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" name=kata-runtime pid=27335 source=virtcontainers subsystem=kata_agent
time="2018-07-02T11:51:48.8872918+01:00" level=error msg="rpc error: code = DeadlineExceeded desc = context deadline exceeded" command=kill container=8f2b8cc2f85c93ed30c61abcc5f25c923344c28295a693af2a23bac2a7008408 name=kata-runtime pid=27335 sandbox=8f2b8cc2f85c93ed30c61abcc5f25c923344c28295a693af2a23bac2a7008408 source=runtime
time="2018-07-02T11:51:49.249058752+01:00" level=error msg="sendReq failed" arch=amd64 error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" name=kata-runtime pid=27330 source=virtcontainers subsystem=kata_agent
time="2018-07-02T11:51:49.249604392+01:00" level=error msg="signalProcess() sendReq failed" arch=amd64 container-id=582e7fff4cf013de61ddb7d02764bc21dc7a9caad95bd8c173a6508038e9e10d error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" name=kata-runtime pid=27330 source=virtcontainers subsystem=kata_agent
time="2018-07-02T11:51:49.250132138+01:00" level=error msg="rpc error: code = DeadlineExceeded desc = context deadline exceeded" command=kill container=582e7fff4cf013de61ddb7d02764bc21dc7a9caad95bd8c173a6508038e9e10d name=kata-runtime pid=27330 sandbox=582e7fff4cf013de61ddb7d02764bc21dc7a9caad95bd8c173a6508038e9e10d source=runtime
time="2018-07-02T11:51:49.335716463+01:00" level=error msg="sendReq failed" arch=amd64 error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" name=kata-runtime pid=27474 source=virtcontainers subsystem=kata_agent
time="2018-07-02T11:51:49.336945716+01:00" level=error msg="signalProcess() sendReq failed" arch=amd64 container-id=21e55708d2e9ec6853b947df1490f263f33a67da380240e3bdce999fbd3fe6a3 error="rpc error: code = DeadlineExceeded desc = context deadline exceeded" name=kata-runtime pid=27474 source=virtcontainers subsystem=kata_agent
time="2018-07-02T11:51:49.339115749+01:00" level=error msg="rpc error: code = DeadlineExceeded desc = context deadline exceeded" command=kill container=21e55708d2e9ec6853b947df1490f263f33a67da380240e3bdce999fbd3fe6a3 name=kata-runtime pid=27474 sandbox=21e55708d2e9ec6853b947df1490f263f33a67da380240e3bdce999fbd3fe6a3 source=runtime

That yamux timeout find is new for me, and is quite likely related to https://github.com/kata-containers/agent/pull/263, which is not settled out yet. At this point I suspect the agent end yamux has timed out on a keepalive as the host side is somewhat busy, and then death follows (as the agent/comms channel shuts down?).

@jodh-intel - on the logging format etc. front - the line coming back from the agent/yamux:

time="2018-07-02T11:49:16.217469645+01:00" level=info msg="2018/07/02 10:49:16 [ERR] yamux: keepalive failed: i/o deadline reached\n" name=kata-proxy pid=25683 source=agent

doesn't have a pod or CID in it - which I suspect the agent has access to, you think? I have to wonder if we can (and should) hook our logrus into the yamux logger config item, which we seem to currently not use in either direction (proxy or agent): https://github.com/hashicorp/yamux/blob/master/mux.go#L33 Let me know what you think - I can go try to improve that logging status if you think we can?

sboeuf commented 6 years ago

@grahamwhaley have you given a quick try of applying https://github.com/kata-containers/agent/pull/263 to the agent, and see if you can reproduce the issue ?

grahamwhaley commented 6 years ago

Will do. I held off as it looked from the comments over there that that PR caused some other issues, but I can try it and have a compare. np.

sboeuf commented 6 years ago

@grahamwhaley yeah I know, you might hit some other issues, but let's cross the fingers you won't and you'll be able to verify that it might solve your current issue.

grahamwhaley commented 6 years ago

@sboeuf yay \o/ - kata-containers/agent#263 fixed the qemu death hangup issue! I ran 5 loops of 110 nginx containers, and got 0 hangs. It would normally hang on the first loop. Now I guess we 'just' have to figure out the other wrinkles of kata-containers/agent#263...

Oh, (if I've not already then) I'll open an Issue for the grpc context timeout so we can discuss what the right thing to do there is.

jodh-intel commented 6 years ago

Great news! I've just made https://github.com/kata-containers/tests/issues/195 a P2 as we clearly never want to have to debug this sort of nightmare again - we want immediate CI feedback when we broke it ;)

sboeuf commented 6 years ago

Oh that is very good news @grahamwhaley !!! Now, as you said, we have to figure out how to merge #263 without having those weird issues @devimc and myself ran into. @jodh-intel +1 for gating this through CI.

GabyCT commented 5 years ago

Closing this issue as it was fixed by https://github.com/kata-containers/agent/pull/263