kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

kata will hang or stuck on running nginx on Ubuntu or Centos #668

Closed zhiminghufighting closed 3 years ago

zhiminghufighting commented 6 years ago

kata will hang or stuck on running nginx on Ubuntu or centos with very high possibility

kata running on host as bare metal mode; Kata runtime version:1.1.0 & 1.2.0 Host OS: Ubuntu: 16.04 Host OS: Centos: 7.3 nginx: 1.15.1 (nginx image is pulled from public image hub of docker.io)

Expected result

nginx running in kata container will never stuck or hang

Actual result

In Centos 7.3 or Ubuntu16.04, kata container will hang or stuck after the nginx server being started more than 1 hour with very high possibility(more than 7 in 10;) "docker stop" can not stop kata container; "docker exec" can not enter kata container; "kill -9" can not kill the kata container/runtime/ process;
docker & kata runtime will not response to any cmd; need to reboot host os to restart the docker and kata;

There is never such issue if we switch runtime to runc to run the same nginx image;

Meta details

Paste the log and verify the same issue on kata 1.2.0:

Running kata-collect-data.sh version 1.2.0 (commit 0bcb32f) at 2018-08-31.13:53:55.157847694+0800.


Runtime is /bin/kata-runtime.

kata-env

Output of "/bin/kata-runtime kata-env":

[Meta]
  Version = "1.0.13"

[Runtime]
  Debug = false
  [Runtime.Version]
    Semver = "1.2.0"
    Commit = "0bcb32f"
    OCI = "1.0.1"
  [Runtime.Config]
    Path = "/usr/share/defaults/kata-containers/configuration.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  Msize9p = 8192
  Debug = false
  UseVSock = false

[Image]
  Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.2.0_agent_fcfa054a757.img"

[Kernel]
  Path = "/usr/share/kata-containers/vmlinuz-4.14.51.10-135.1.container"
  Parameters = ""

[Initrd]
  Path = ""

[Proxy]
  Type = "kataProxy"
  Version = "kata-proxy version 1.2.0-1796218"
  Path = "/usr/libexec/kata-containers/kata-proxy"
  Debug = false

[Shim]
  Type = "kataShim"
  Version = "kata-shim version 1.2.0-0a37760"
  Path = "/usr/libexec/kata-containers/kata-shim"
  Debug = false

[Agent]
  Type = "kata"

[Host]
  Kernel = "3.10.0-862.3.2.el7.x86_64"
  Architecture = "amd64"
  VMContainerCapable = true
  [Host.Distro]
    Name = "CentOS Linux"
    Version = "7"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz"

Runtime config files

Runtime default config files

/etc/kata-containers/configuration.toml
/usr/share/defaults/kata-containers/configuration.toml

Runtime config file contents

Config file /etc/kata-containers/configuration.toml not found Output of "cat "/usr/share/defaults/kata-containers/configuration.toml"":

# Copyright (c) 2017-2018 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#

# XXX: WARNING: this file is auto-generated.
# XXX:
# XXX: Source file: "cli/config/configuration.toml.in"
# XXX: Project:
# XXX:   Name: Kata Containers
# XXX:   Type: kata

[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/kata-containers/vmlinuz.container"
image = "/usr/share/kata-containers/kata-containers.img"
machine_type = "pc"

# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc.
#
# WARNING: - any parameter specified here will take priority over the default
# parameter value of the same name used to start the virtual machine.
# Do not set values here unless you understand the impact of doing so as you
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per SB/VM:
# unspecified or 0                --> will be set to 1
# < 0                             --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores
default_vcpus = 1

# Default maximum number of vCPUs per SB/VM:
# unspecified or == 0             --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores      --> will be set to the actual number of physical cores or to the maximum number
#                                     of vCPUs supported by KVM if that number is exceeded
# WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when
# the actual number of physical cores is greater than it.
# WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU
# the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs
# can be added to a SB/VM, but the memory footprint will be big. Another example, with
# `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of
# vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable,
# unless you know what are you doing.
default_maxvcpus = 0

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per SB/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for SB/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048

# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's 
# root file system is backed by a block device, the block device is passed
# directly to the hypervisor for performance reasons. 
# This flag prevents the block device from being passed to the hypervisor, 
# 9pfs is used instead to pass the rootfs.
disable_block_device_use = false

# Block storage driver to be used for the hypervisor in case the container
# rootfs is backed by a block device. This is either virtio-scsi or 
# virtio-blk.
block_device_driver = "virtio-scsi"

# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
#
enable_iothreads = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. This extra output is added
# to the proxy logs, but only when proxy debug is also enabled.
# 
# Default false
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

# This is the msize used for 9p shares. It is the number of bytes 
# used for 9p packet payload.
#msize_9p = 8192

# If true and vsocks are supported, use vsocks to communicate directly
# with the agent and no proxy is started, otherwise use unix
# sockets and start a proxy to communicate with the agent.
# Default false
#use_vsock = true

[factory]
# VM templating support. Once enabled, new VMs are created from template
# using vm cloning. They will share the same initial kernel, initramfs and
# agent memory by mapping it readonly. It helps speeding up new container
# creation and saves a lot of memory if there are many kata containers running
# on the same host.
#
# When disabled, new VMs are created from scratch.
#
# Default false
#enable_template = true

[proxy.kata]
path = "/usr/libexec/kata-containers/kata-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[shim.kata]
path = "/usr/libexec/kata-containers/kata-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[agent.kata]
# There is no field for this section. The goal is only to be able to
# specify which type of agent the user wants to use.

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
# Options:
#
#   - bridged
#     Uses a linux bridge to interconnect the container interface to
#     the VM. Works for most cases except macvlan and ipvlan.
#
#   - macvtap
#     Used when the Container network interface can be bridged using
#     macvtap.
internetworking_model="macvtap"

Image details

---
osbuilder:
  url: "https://github.com/kata-containers/osbuilder"
  version: "unknown"
rootfs-creation-time: "2018-08-13T22:51:39.765008919+0000Z"
description: "osbuilder rootfs"
file-format-version: "0.0.2"
architecture: "x86_64"
base-distro:
  name: "Clear"
  version: "24400"
  packages:
    default:
      - "iptables-bin"
      - "libudev0-shim"
      - "systemd"
    extra:

agent:
  url: "https://github.com/kata-containers/agent"
  name: "kata-agent"
  version: "1.2.0-fcfa054a757e7c17afba47b0b4d7e91cbb8688ed"
  agent-is-init-daemon: "no"

Initrd details

No initrd


Logfiles

Runtime logs

Recent runtime problems found in system journal:

time="2018-08-31T10:46:24.135592249+08:00" level=warning msg="fetch sandbox device failed" arch=amd64 command=create container=a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae error="open /run/vc/sbs/a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae/devices.json: no such file or directory" name=kata-runtime pid=3826 sandbox=a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae sandboxid=a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae source=virtcontainers subsystem=sandbox
time="2018-08-31T13:26:57.646882091+08:00" level=error msg="Container a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae not ready, running or paused, cannot send a signal" arch=amd64 command=kill container=a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae name=kata-runtime pid=15952 sandbox=a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae source=runtime
time="2018-08-31T13:26:57.695227659+08:00" level=error msg="Container a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae not ready, running or paused, cannot send a signal" arch=amd64 command=kill container=a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae name=kata-runtime pid=16016 sandbox=a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae source=runtime
time="2018-08-31T13:50:08.670452533+08:00" level=warning msg="fetch sandbox device failed" arch=amd64 command=create container=c140df3834e8db675926af8460b57cef246e750a1860dfd5d515cad259191766 error="open /run/vc/sbs/c140df3834e8db675926af8460b57cef246e750a1860dfd5d515cad259191766/devices.json: no such file or directory" name=kata-runtime pid=18283 sandbox=c140df3834e8db675926af8460b57cef246e750a1860dfd5d515cad259191766 sandboxid=c140df3834e8db675926af8460b57cef246e750a1860dfd5d515cad259191766 source=virtcontainers subsystem=sandbox

Proxy logs

Recent proxy problems found in system journal:

time="2018-08-31T13:26:57.618051166+08:00" level=fatal msg="channel error" error="accept unix /run/vc/sbs/a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae/proxy.sock: use of closed network connection" name=kata-proxy pid=3882 sandbox=a04b11bcef173cbd02b47c3d469b72a3c7123ec68b9647a8c0029472a4609cae source=proxy

Shim logs

No recent shim problems found in system journal.


Container manager details

Have docker

Docker

Output of "docker version":

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:23:03 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:25:29 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of "docker info":

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 18.06.1-ce
Storage Driver: devicemapper
 Pool Name: docker1-thinpool
 Pool Blocksize: 524.3kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data Space Used: 229.6MB
 Data Space Total: 29.88GB
 Data Space Available: 29.65GB
 Metadata Space Used: 118.8kB
 Metadata Space Total: 310.4MB
 Metadata Space Available: 310.3MB
 Thin Pool Minimum Free Space: 2.988GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.146-RHEL7 (2018-01-22)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: kata-runtime runc
Default Runtime: kata-runtime
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 0bcb32f (expected: 69663f0bd4b60df09991c08812a60108003fa340)
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.3.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.27GiB
Name: localhost.localdomain
ID: OZTQ:BK7W:4HIC:MFGK:JTF3:5Y7H:PNJU:6EFK:67SH:FCFY:CJNG:KUJF
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 29
 Goroutines: 49
 System Time: 2018-08-31T13:53:55.47969715+08:00
 EventsListeners: 0
HTTP Proxy: http://child-prc.intel.com:913/
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Output of "systemctl show docker":

Type=notify
Restart=on-failure
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=0
TimeoutStopUSec=1min 30s
WatchdogUSec=0
WatchdogTimestamp=Fri 2018-08-31 13:49:09 CST
WatchdogTimestampMonotonic=12689068837
StartLimitInterval=60000000
StartLimitBurst=3
StartLimitAction=none
FailureAction=none
PermissionsStartOnly=no
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=17885
ControlPID=0
FileDescriptorStoreMax=0
StatusErrno=0
Result=success
ExecMainStartTimestamp=Fri 2018-08-31 13:49:08 CST
ExecMainStartTimestampMonotonic=12688097899
ExecMainExitTimestampMonotonic=0
ExecMainPID=17885
ExecMainCode=0
ExecMainStatus=0
ExecStart={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd -D --add-runtime kata-runtime=/usr/bin/kata-runtime --default-runtime=kata-runtime ; ignore_errors=no ; start_time=[Fri 2018-08-31 13:49:08 CST] ; stop_time=[n/a] ; pid=17885 ; code=(null) ; status=0/0 }
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
ControlGroup=/system.slice/docker.service
MemoryCurrent=340971520
TasksCurrent=86
Delegate=yes
CPUAccounting=no
CPUShares=18446744073709551615
StartupCPUShares=18446744073709551615
CPUQuotaPerSecUSec=infinity
BlockIOAccounting=no
BlockIOWeight=18446744073709551615
StartupBlockIOWeight=18446744073709551615
MemoryAccounting=no
MemoryLimit=18446744073709551615
DevicePolicy=auto
TasksAccounting=no
TasksMax=18446744073709551615
Environment=HTTP_PROXY=http://child-prc.intel.com:913/
UMask=0022
LimitCPU=18446744073709551615
LimitFSIZE=18446744073709551615
LimitDATA=18446744073709551615
LimitSTACK=18446744073709551615
LimitCORE=18446744073709551615
LimitRSS=18446744073709551615
LimitNOFILE=18446744073709551615
LimitAS=18446744073709551615
LimitNPROC=18446744073709551615
LimitMEMLOCK=65536
LimitLOCKS=18446744073709551615
LimitSIGPENDING=62113
LimitMSGQUEUE=819200
LimitNICE=0
LimitRTPRIO=0
LimitRTTIME=18446744073709551615
OOMScoreAdjust=0
Nice=0
IOScheduling=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SecureBits=0
CapabilityBoundingSet=18446744073709551615
AmbientCapabilities=0
MountFlags=0
PrivateTmp=no
PrivateNetwork=no
PrivateDevices=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
RuntimeDirectoryMode=0755
KillMode=process
KillSignal=15
SendSIGKILL=yes
SendSIGHUP=no
Id=docker.service
Names=docker.service
Requires=basic.target
Wants=network-online.target system.slice
Conflicts=shutdown.target
Before=shutdown.target
After=systemd-journald.socket basic.target firewalld.service system.slice network-online.target
Documentation=https://docs.docker.com
Description=Docker Application Container Engine
LoadState=loaded
ActiveState=active
SubState=running
FragmentPath=/usr/lib/systemd/system/docker.service
DropInPaths=/etc/systemd/system/docker.service.d/kata-container.conf
UnitFileState=disabled
UnitFilePreset=disabled
InactiveExitTimestamp=Fri 2018-08-31 13:49:08 CST
InactiveExitTimestampMonotonic=12688098378
ActiveEnterTimestamp=Fri 2018-08-31 13:49:09 CST
ActiveEnterTimestampMonotonic=12689068997
ActiveExitTimestamp=Fri 2018-08-31 13:49:07 CST
ActiveExitTimestampMonotonic=12687071866
InactiveEnterTimestamp=Fri 2018-08-31 13:49:08 CST
InactiveEnterTimestampMonotonic=12688079495
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
IgnoreOnSnapshot=no
NeedDaemonReload=no
JobTimeoutUSec=0
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Fri 2018-08-31 13:49:08 CST
ConditionTimestampMonotonic=12688097102
AssertTimestamp=Fri 2018-08-31 13:49:08 CST
AssertTimestampMonotonic=12688097102
Transient=no

Have kubectl

Kubernetes

Output of "kubectl version":

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Error from server (NotFound): the server could not find the requested resource

Output of "kubectl config view":

apiVersion: v1
clusters: []
contexts: []
current-context: ""
kind: Config
preferences: {}
users: []

Output of "systemctl show kubelet":

Type=simple
Restart=always
NotifyAccess=none
RestartUSec=10s
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
WatchdogUSec=0
WatchdogTimestampMonotonic=0
StartLimitInterval=0
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
PermissionsStartOnly=no
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=0
ControlPID=0
FileDescriptorStoreMax=0
StatusErrno=0
Result=exit-code
ExecMainStartTimestamp=Fri 2018-08-31 13:53:49 CST
ExecMainStartTimestampMonotonic=12968309322
ExecMainExitTimestamp=Fri 2018-08-31 13:53:49 CST
ExecMainExitTimestampMonotonic=12968359830
ExecMainPID=18951
ExecMainCode=1
ExecMainStatus=255
ExecStart={ path=/usr/bin/kubelet ; argv[]=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS ; ignore_errors=no ; start_time=[Fri 2018-08-31 13:53:49 CST] ; stop_time=[Fri 2018-08-31 13:53:49 CST] ; pid=18951 ; code=exited ; status=255 }
Slice=system.slice
MemoryCurrent=18446744073709551615
TasksCurrent=18446744073709551615
Delegate=no
CPUAccounting=no
CPUShares=18446744073709551615
StartupCPUShares=18446744073709551615
CPUQuotaPerSecUSec=infinity
BlockIOAccounting=no
BlockIOWeight=18446744073709551615
StartupBlockIOWeight=18446744073709551615
MemoryAccounting=no
MemoryLimit=18446744073709551615
DevicePolicy=auto
TasksAccounting=no
TasksMax=18446744073709551615
Environment=KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf\x20--kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests\x20--allow-privileged=true KUBELET_NETWORK_ARGS=--network-plugin=cni\x20--cni-conf-dir=/etc/cni/net.d\x20--cni-bin-dir=/opt/cni/bin KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10\x20--cluster-domain=cluster.local KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook\x20--client-ca-file=/etc/kubernetes/pki/ca.crt KUBELET_CADVISOR_ARGS=--cadvisor-port=0 KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true\x20--cert-dir=/var/lib/kubelet/pki
UMask=0022
LimitCPU=18446744073709551615
LimitFSIZE=18446744073709551615
LimitDATA=18446744073709551615
LimitSTACK=18446744073709551615
LimitCORE=18446744073709551615
LimitRSS=18446744073709551615
LimitNOFILE=4096
LimitAS=18446744073709551615
LimitNPROC=62113
LimitMEMLOCK=65536
LimitLOCKS=18446744073709551615
LimitSIGPENDING=62113
LimitMSGQUEUE=819200
LimitNICE=0
LimitRTPRIO=0
LimitRTTIME=18446744073709551615
OOMScoreAdjust=0
Nice=0
IOScheduling=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SecureBits=0
CapabilityBoundingSet=18446744073709551615
AmbientCapabilities=0
MountFlags=0
PrivateTmp=no
PrivateNetwork=no
PrivateDevices=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
RuntimeDirectoryMode=0755
KillMode=control-group
KillSignal=15
SendSIGKILL=yes
SendSIGHUP=no
Id=kubelet.service
Names=kubelet.service
Requires=basic.target
Wants=system.slice
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=multi-user.target shutdown.target
After=basic.target system.slice systemd-journald.socket
Documentation=http://kubernetes.io/docs/
Description=kubelet: The Kubernetes Node Agent
LoadState=loaded
ActiveState=activating
SubState=auto-restart
FragmentPath=/etc/systemd/system/kubelet.service
DropInPaths=/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
UnitFileState=enabled
UnitFilePreset=disabled
InactiveExitTimestamp=Fri 2018-08-31 13:53:49 CST
InactiveExitTimestampMonotonic=12968367327
ActiveEnterTimestamp=Fri 2018-08-31 13:53:49 CST
ActiveEnterTimestampMonotonic=12968309373
ActiveExitTimestamp=Fri 2018-08-31 13:53:49 CST
ActiveExitTimestampMonotonic=12968360055
InactiveEnterTimestamp=Fri 2018-08-31 13:53:49 CST
InactiveEnterTimestampMonotonic=12968360055
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
IgnoreOnSnapshot=no
NeedDaemonReload=no
JobTimeoutUSec=0
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Fri 2018-08-31 13:53:49 CST
ConditionTimestampMonotonic=12968308505
AssertTimestamp=Fri 2018-08-31 13:53:49 CST
AssertTimestampMonotonic=12968308505
Transient=no

No crio


Packages

No dpkg Have rpm Output of "rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-containers-image|linux-container|qemu-)"":

qemu-lite-data-2.11.0+git.a39e0b3e82-48.1.x86_64
qemu-vanilla-2.11.2+git.0982a56a55-46.1.x86_64
kata-linux-container-4.14.51.10-135.1.x86_64
kata-runtime-1.2.0+git.0bcb32f-47.1.x86_64
qemu-vanilla-data-2.11.2+git.0982a56a55-46.1.x86_64
qemu-vanilla-bin-2.11.2+git.0982a56a55-46.1.x86_64
kata-shim-1.2.0+git.0a37760-33.1.x86_64
kata-proxy-1.2.0+git.1796218-32.1.x86_64
qemu-lite-2.11.0+git.a39e0b3e82-48.1.x86_64
kata-shim-bin-1.2.0+git.0a37760-33.1.x86_64
kata-proxy-bin-1.2.0+git.1796218-32.1.x86_64
qemu-lite-bin-2.11.0+git.a39e0b3e82-48.1.x86_64
kata-containers-image-1.2.0-32.1.x86_64

jodh-intel commented 6 years ago

Hi @zhiminghufighting - thanks for raising this!

Please could you paste the output of sudo kata-collect-data.sh into this issue as that will give us more information on the issue (particularly if you could recreate it with full debug enabled). Please check the output of that script before you paste it, to ensure there is no sensitive information in it too ;)

zhiminghufighting commented 6 years ago

Sure, i am trying to reproduce it on kata 1.2.0 and check if there still exist. Once issue is reproduced, i will run the .sh and paste the log to you.

grahamwhaley commented 6 years ago

/cc @egernst @GabyCT as they have both looked at this area with nginx/ab recently and may have some ideas already.

zhiminghufighting commented 6 years ago

Do we have any new founds? thanks a lot!

caoruidong commented 6 years ago

Does it still exist when you use 1.2.0 or master?

jodh-intel commented 6 years ago

Hi @zhiminghufighting - can you provide further details (see https://github.com/kata-containers/runtime/issues/668#issuecomment-417225142)?

caoruidong commented 6 years ago

I test nginx:latest with Kata runtime master on Centos: 7.4 for more than 1 hour and it works well.

amshinde commented 6 years ago

I have tested nginx/ab with 1.2.0 extensively in the past. And did not see any issue. @zhiminghufighting Can you provide more details?

zhiminghufighting commented 6 years ago

@caoruidong, this issue will not be observed by just running a stable time. Sometimes, kata nginx contianer run more than 12 or 24 hours, you can see nginx by "docker ps", but if you use "docker stop xxxx" to stop it, it will no response; or if you use "docker exec -ti xxxx /bin/bash" and you will fail to enter kata nginx;

This is only observed on centos 7.3 (3.10.0-862) and ubuntu 16.04. It will not be oberserved on Debian 9.

zhiminghufighting commented 6 years ago

@amshinde New founds: On my centos 7.3(3.10.0-862), this issues will be observed only by running with nginx image(pull from default docker hub). No such issue can be observed when i use kata to launch centos/ubuntu/ images.

devimc commented 6 years ago

@zhiminghufighting have you tried with use_vsock = true? https://github.com/kata-containers/documentation/blob/master/VSocks.md#system-requirements

gabibeyer commented 5 years ago

@zhiminghufighting Are you still experiencing this issue at all? If not, this issue may be stale and should be closed.