kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

Only 1 CPU is used even if more CPUs are available. #2809

Closed parthasl closed 4 years ago

parthasl commented 4 years ago

Description of problem

Only 1 CPU is used even if more CPUs are available. Setup is k8s + cri-containerd + kata. Launched a kubernetes container with 4 CPU's, ran stress --cpu 4 --timeout 60s and expected all 4 CPU's to show usage percentage; but only 1 CPU shows 100%.

kubernetes resources definition: ` resources: requests: memory: "2Gi" cpu: "4" limits: memory: "2Gi" cpu: "4" args:

Expected result

available CPUs should be used, in this case, all 4. Expected QEMU summary to show Summary: QEMU Standard PC, 5 x Xeon Gold 6140 2.30GHz, 13.9GB / 2GB RAM Processors: 5 x Xeon Gold 6140 2.30GHz (HT missing, 5 cores, 5 threads) - Sky Lake-E, 14nm, L3: 36MB

Actual result

htop inside container: image

nproc inside container: 1

nproc --all inside container: 5

hwconfig inside container:

Summary:    QEMU Standard PC, 1 x Xeon Gold 6140 2.30GHz, 13.9GB / 2GB RAM
System:     QEMU Standard PC, C-xK/2/0
Processors: 1 x Xeon Gold 6140 2.30GHz (HT missing, 7 cores, 7 threads) - Sky Lake-E, 14nm, L3: 36MB
Memory:     13.9GB / 2GB RAM == 1 x 2GB
Disk:       sda (virtio_scsi0): 107GB (0%) == 1 x 107GB QEMU-QEMU-HARDDISK
Disk:       sdb (virtio_scsi0): 107GB (3%) == 1 x 107GB QEMU-QEMU-HARDDISK
Disk-Control:   00:01.1: Bochs/Intel 82371SB PIIX3 ATA/33
Disk-Control:   virtio-pci1: Red Hat, . Virtio SCSI
Network:    00:07.0 (virtio-pci4): Red Hat Virtio Network Device
Chipset:    Intel 440FX (Natoma), 82371SB (PIIX3)
OS:     RHEL Server 7.8, Linux 4.19.86-7.1.container x86_64, 64-bit
BIOS:       SeaBIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014, rev 0.0

lscpu inside container:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                5
On-line CPU(s) list:   0-4
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             5
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Stepping:              4
CPU MHz:               2294.608
BogoMIPS:              4589.21
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
L3 cache:              16384K
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku md_clear

kata-collect-data.sh

Show kata-collect-data.sh details

# Meta details Running `kata-collect-data.sh` version `1.11.1 (commit 984ccea48a1badd2863a68f91c8d95e229898095)` at `2020-06-29.06:00:53.076170759+0000`. --- Runtime is `/usr/bin/kata-runtime`. # `kata-env` Output of "`/usr/bin/kata-runtime kata-env`": ```toml [Meta] Version = "1.0.23" [Runtime] Debug = false Trace = false DisableGuestSeccomp = true DisableNewNetNs = false SandboxCgroupOnly = false Path = "/usr/bin/kata-runtime" [Runtime.Version] Semver = "1.10.2" Commit = "29c489e64d5b76e7624e8fbcc40c26402f95bf64" OCI = "1.0.1-dev" [Runtime.Config] Path = "/etc/kata-containers/configuration.toml" [Hypervisor] MachineType = "pc" Version = "QEMU emulator version 4.1.0\nCopyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers" Path = "/usr/bin/qemu-vanilla-system-x86_64" BlockDeviceDriver = "virtio-scsi" EntropySource = "/dev/urandom" Msize9p = 8192 MemorySlots = 10 Debug = false UseVSock = false SharedFS = "virtio-9p" [Image] Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.10.2_agent_9ecf1b6c06.img" [Kernel] Path = "/usr/share/kata-containers/vmlinuz-4.19.86.60-7.1.container" Parameters = "systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket vsyscall=emulate init=/usr/bin/kata-agent" [Initrd] Path = "" [Proxy] Type = "kataProxy" Version = "kata-proxy version 1.10.2-4fe00a9" Path = "/usr/libexec/kata-containers/kata-proxy" Debug = false [Shim] Type = "kataShim" Version = "kata-shim version 1.10.2-f75e584" Path = "/usr/libexec/kata-containers/kata-shim" Debug = false [Agent] Type = "kata" Debug = false Trace = false TraceMode = "" TraceType = "" [Host] Kernel = "3.10.0-1062.12.1.el7.YAHOO.20200205.52.x86_64" Architecture = "amd64" VMContainerCapable = true SupportVSocks = true [Host.Distro] Name = "Red Hat Enterprise Linux Server" Version = "7.7" [Host.CPU] Vendor = "GenuineIntel" Model = "Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz" [Netmon] Version = "kata-netmon version 1.10.2" Path = "/usr/libexec/kata-containers/kata-netmon" Debug = false Enable = false ``` --- # Runtime config files ## Runtime default config files ``` /etc/kata-containers/configuration.toml /usr/share/defaults/kata-containers/configuration.toml ``` ## Runtime config file contents Output of "`cat "/etc/kata-containers/configuration.toml"`": ```toml #Copyrighn (c) 2017-2019 Intel Corporation # # SPDX-License-Identifier: Apache-2.0 # # XXX: WARNING: this file is auto-generated. # XXX: # XXX: Source file: "cli/config/configuration-qemu.toml.in" # XXX: Project: # XXX: Name: Kata Containers # XXX: Type: kata [hypervisor.qemu] path = "/usr/bin/qemu-vanilla-system-x86_64" kernel = "/usr/share/kata-containers/vmlinuz.container" image = "/usr/share/kata-containers/kata-containers.img" machine_type = "pc" # Optional space-separated list of options to pass to the guest kernel. # For example, use `kernel_params = "vsyscall=emulate"` if you are having # trouble running pre-2.15 glibc. # # WARNING: - any parameter specified here will take priority over the default # parameter value of the same name used to start the virtual machine. # Do not set values here unless you understand the impact of doing so as you # may stop the virtual machine from booting. # To see the list of default parameters, enable hypervisor debug, create a # container and look for 'default-kernel-parameters' log entries. kernel_params = "vsyscall=emulate init=/usr/bin/kata-agent" # Path to the firmware. # If you want that qemu uses the default firmware leave this option empty firmware = "" # Machine accelerators # comma-separated list of machine accelerators to pass to the hypervisor. # For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"` machine_accelerators="" # Default number of vCPUs per SB/VM: # unspecified or 0 --> will be set to 1 # < 0 --> will be set to the actual number of physical cores # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores default_vcpus = 0 # Default maximum number of vCPUs per SB/VM: # unspecified or == 0 --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when # the actual number of physical cores is greater than it. # WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU # the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs # can be added to a SB/VM, but the memory footprint will be big. Another example, with # `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of # vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable, # unless you know what are you doing. default_maxvcpus = 0 # Bridges can be used to hot plug devices. # Limitations: # * Currently only pci bridges are supported # * Until 30 devices per bridge can be hot plugged. # * Until 5 PCI bridges can be cold plugged per VM. # This limitation could be a bug in qemu or in the kernel # Default number of bridges per SB/VM: # unspecified or 0 --> will be set to 1 # > 1 <= 5 --> will be set to the specified number # > 5 --> will be set to 5 default_bridges = 1 # Default memory size in MiB for SB/VM. # If unspecified then it will be set 2048 MiB. default_memory = 2048 # # Default memory slots per SB/VM. # If unspecified then it will be set 10. # This is will determine the times that memory will be hotadded to sandbox/VM. #memory_slots = 10 # The size in MiB will be plused to max memory of hypervisor. # It is the memory address space for the NVDIMM devie. # If set block storage driver (block_device_driver) to "nvdimm", # should set memory_offset to the size of block device. # Default 0 #memory_offset = 0 # Disable block device from being used for a container's rootfs. # In case of a storage driver like devicemapper where a container's # root file system is backed by a block device, the block device is passed # directly to the hypervisor for performance reasons. # This flag prevents the block device from being passed to the hypervisor, # 9pfs is used instead to pass the rootfs. disable_block_device_use = false # Shared file system type: # - virtio-9p (default) # - virtio-fs shared_fs = "virtio-9p" # Path to vhost-user-fs daemon. virtio_fs_daemon = "/opt/kata/bin/virtiofsd" # Default size of DAX cache in MiB virtio_fs_cache_size = 1024 # Extra args for virtiofsd daemon # # Format example: # ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"] # # see `virtiofsd -h` for possible options. virtio_fs_extra_args = [] # Cache mode: # # - none # Metadata, data, and pathname lookup are not cached in guest. They are # always fetched from host and any changes are immediately pushed to host. # # - auto # Metadata and pathname lookup cache expires after a configured amount of # time (default is 1 second). Data is cached while the file is open (close # to open consistency). # # - always # Metadata, data, and pathname lookup are cached in guest and never expire. virtio_fs_cache = "always" # Block storage driver to be used for the hypervisor in case the container # rootfs is backed by a block device. This is virtio-scsi, virtio-blk # or nvdimm. block_device_driver = "virtio-scsi" # Specifies cache-related options will be set to block devices or not. # Default false #block_device_cache_set = true # Specifies cache-related options for block devices. # Denotes whether use of O_DIRECT (bypass the host page cache) is enabled. # Default false #block_device_cache_direct = true # Specifies cache-related options for block devices. # Denotes whether flush requests for the device are ignored. # Default false #block_device_cache_noflush = true # Enable iothreads (data-plane) to be used. This causes IO to be # handled in a separate IO thread. This is currently only implemented # for SCSI. # enable_iothreads = false # Enable pre allocation of VM RAM, default false # Enabling this will result in lower container density # as all of the memory will be allocated and locked # This is useful when you want to reserve all the memory # upfront or in the cases where you want memory latencies # to be very predictable # Default false enable_mem_prealloc = true # Enable huge pages for VM RAM, default false # Enabling this will result in the VM memory # being allocated using huge pages. # This is useful when you want to use vhost-user network # stacks within the container. This will automatically # result in memory pre allocation #enable_hugepages = true # Enable file based guest memory support. The default is an empty string which # will disable this feature. In the case of virtio-fs, this is enabled # automatically and '/dev/shm' is used as the backing folder. # This option will be ignored if VM templating is enabled. #file_mem_backend = "" # Enable swap of vm memory. Default false. # The behaviour is undefined if mem_prealloc is also set to true #enable_swap = true # This option changes the default hypervisor and kernel parameters # to enable debug output where available. This extra output is added # to the proxy logs, but only when proxy debug is also enabled. # # Default false #enable_debug = true # Disable the customizations done in the runtime when it detects # that it is running on top a VMM. This will result in the runtime # behaving as it would when running on bare metal. # #disable_nesting_checks = true # This is the msize used for 9p shares. It is the number of bytes # used for 9p packet payload. #msize_9p = 8192 # If true and vsocks are supported, use vsocks to communicate directly # with the agent and no proxy is started, otherwise use unix # sockets and start a proxy to communicate with the agent. # Default false #use_vsock = true # VFIO devices are hotplugged on a bridge by default. # Enable hotplugging on root bus. This may be required for devices with # a large PCI bar, as this is a current limitation with hotplugging on # a bridge. This value is valid for "pc" machine type. # Default false #hotplug_vfio_on_root_bus = true # If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off # security (vhost-net runs ring0) for network I/O performance. #disable_vhost_net = true # # Default entropy source. # The path to a host source of entropy (including a real hardware RNG) # /dev/urandom and /dev/random are two main options. # Be aware that /dev/random is a blocking source of entropy. If the host # runs out of entropy, the VMs boot time will increase leading to get startup # timeouts. # The source of entropy /dev/urandom is non-blocking and provides a # generally acceptable source of entropy. It should work well for pretty much # all practical purposes. #entropy_source= "/dev/urandom" # Path to OCI hook binaries in the *guest rootfs*. # This does not affect host-side hooks which must instead be added to # the OCI spec passed to the runtime. # # You can create a rootfs with hooks by customizing the osbuilder scripts: # https://github.com/kata-containers/osbuilder # # Hooks must be stored in a subdirectory of guest_hook_path according to their # hook type, i.e. "guest_hook_path/{prestart,postart,poststop}". # The agent will scan these directories for executable files and add them, in # lexicographical order, to the lifecycle of the guest container. # Hooks are executed in the runtime namespace of the guest. See the official documentation: # https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks # Warnings will be logged if any error is encountered will scanning for hooks, # but it will not abort container execution. #guest_hook_path = "/usr/share/oci/hooks" [factory] # VM templating support. Once enabled, new VMs are created from template # using vm cloning. They will share the same initial kernel, initramfs and # agent memory by mapping it readonly. It helps speeding up new container # creation and saves a lot of memory if there are many kata containers running # on the same host. # # When disabled, new VMs are created from scratch. # # Note: Requires "initrd=" to be set ("image=" is not supported). # # Default false #enable_template = true # Specifies the path of template. # # Default "/run/vc/vm/template" #template_path = "/run/vc/vm/template" # The number of caches of VMCache: # unspecified or == 0 --> VMCache is disabled # > 0 --> will be set to the specified number # # VMCache is a function that creates VMs as caches before using it. # It helps speed up new container creation. # The function consists of a server and some clients communicating # through Unix socket. The protocol is gRPC in protocols/cache/cache.proto. # The VMCache server will create some VMs and cache them by factory cache. # It will convert the VM to gRPC format and transport it when gets # requestion from clients. # Factory grpccache is the VMCache client. It will request gRPC format # VM and convert it back to a VM. If VMCache function is enabled, # kata-runtime will request VM from factory grpccache when it creates # a new sandbox. # # Default 0 #vm_cache_number = 0 # Specify the address of the Unix socket that is used by VMCache. # # Default /var/run/kata-containers/cache.sock #vm_cache_endpoint = "/var/run/kata-containers/cache.sock" [proxy.kata] path = "/usr/libexec/kata-containers/kata-proxy" # If enabled, proxy messages will be sent to the system log # (default: disabled) #enable_debug = true [shim.kata] path = "/usr/libexec/kata-containers/kata-shim" # If enabled, shim messages will be sent to the system log # (default: disabled) #enable_debug = true # If enabled, the shim will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # # Note: By default, the shim runs in a separate network namespace. Therefore, # to allow it to send trace details to the Jaeger agent running on the host, # it is necessary to set 'disable_new_netns=true' so that it runs in the host # network namespace. # # (default: disabled) #enable_tracing = true [agent.kata] # If enabled, make the agent display debug-level messages. # (default: disabled) #enable_debug = true # Enable agent tracing. # # If enabled, the default trace mode is "dynamic" and the # default trace type is "isolated". The trace mode and type are set # explicity with the `trace_type=` and `trace_mode=` options. # # Notes: # # - Tracing is ONLY enabled when `enable_tracing` is set: explicitly # setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing` # will NOT activate agent tracing. # # - See https://github.com/kata-containers/agent/blob/master/TRACING.md for # full details. # # (default: disabled) #enable_tracing = true # #trace_mode = "dynamic" #trace_type = "isolated" # Comma separated list of kernel modules and their parameters. # These modules will be loaded in the guest kernel using modprobe(8). # The following example can be used to load two kernel modules with parameters # - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"] # The first word is considered as the module name and the rest as its parameters. # Container will not be started when: # * A kernel module is specified and the modprobe command is not installed in the guest # or it fails loading the module. # * The module is not available in the guest or it doesn't met the guest kernel # requirements, like architecture and version. # kernel_modules=[] [netmon] # If enabled, the network monitoring process gets started when the # sandbox is created. This allows for the detection of some additional # network being added to the existing network namespace, after the # sandbox has been created. # (default: disabled) #enable_netmon = true # Specify the path to the netmon binary. path = "/usr/libexec/kata-containers/kata-netmon" # If enabled, netmon messages will be sent to the system log # (default: disabled) #enable_debug = true [runtime] # If enabled, the runtime will log additional debug messages to the # system log # (default: disabled) #enable_debug = true # # Internetworking model # Determines how the VM should be connected to the # the container network interface # Options: # # - macvtap # Used when the Container network interface can be bridged using # macvtap. # # - none # Used when customize network. Only creates a tap device. No veth pair. # # - tcfilter # Uses tc filter rules to redirect traffic from the network interface # provided by plugin to a tap interface connected to the VM. # internetworking_model="tcfilter" # disable guest seccomp # Determines whether container seccomp profiles are passed to the virtual # machine and applied by the kata agent. If set to true, seccomp is not applied # within the guest # (default: true) disable_guest_seccomp=true # If enabled, the runtime will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # (default: disabled) #enable_tracing = true # If enabled, the runtime will not create a network namespace for shim and hypervisor processes. # This option may have some potential impacts to your host. It should only be used when you know what you're doing. # `disable_new_netns` conflicts with `enable_netmon` # `disable_new_netns` conflicts with `internetworking_model=tcfilter` and `internetworking_model=macvtap`. It works only # with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge # (like OVS) directly. # If you are using docker, `disable_new_netns` only works with `docker run --net=none` # (default: false) #disable_new_netns = true # if enabled, the runtime will add all the kata processes inside one dedicated cgroup. # The container cgroups in the host are not created, just one single cgroup per sandbox. # The sandbox cgroup is not constrained by the runtime # The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox. # The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation. # See: https://godoc.org/github.com/kata-containers/runtime/virtcontainers#ContainerType sandbox_cgroup_only=false # Enabled experimental feature list, format: ["a", "b"]. # Experimental features are features not stable enough for production, # They may break compatibility, and are prepared for a big version bump. # Supported experimental features: # 1. "newstore": new persist storage driver which breaks backward compatibility, # expected to move out of experimental in 2.0.0. # (default: []) experimental=[] ``` Config file `/opt/kata/share/defaults/kata-containers/configuration.toml` not found Output of "`cat "/usr/share/defaults/kata-containers/configuration.toml"`": ```toml # Copyright (c) 2017-2019 Intel Corporation # # SPDX-License-Identifier: Apache-2.0 # # XXX: WARNING: this file is auto-generated. # XXX: # XXX: Source file: "cli/config/configuration-qemu.toml.in" # XXX: Project: # XXX: Name: Kata Containers # XXX: Type: kata [hypervisor.qemu] path = "/usr/bin/qemu-vanilla-system-x86_64" kernel = "/usr/share/kata-containers/vmlinuz.container" image = "/usr/share/kata-containers/kata-containers.img" machine_type = "pc" # Optional space-separated list of options to pass to the guest kernel. # For example, use `kernel_params = "vsyscall=emulate"` if you are having # trouble running pre-2.15 glibc. # # WARNING: - any parameter specified here will take priority over the default # parameter value of the same name used to start the virtual machine. # Do not set values here unless you understand the impact of doing so as you # may stop the virtual machine from booting. # To see the list of default parameters, enable hypervisor debug, create a # container and look for 'default-kernel-parameters' log entries. kernel_params = "" # Path to the firmware. # If you want that qemu uses the default firmware leave this option empty firmware = "" # Machine accelerators # comma-separated list of machine accelerators to pass to the hypervisor. # For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"` machine_accelerators="" # Default number of vCPUs per SB/VM: # unspecified or 0 --> will be set to 1 # < 0 --> will be set to the actual number of physical cores # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores default_vcpus = 1 # Default maximum number of vCPUs per SB/VM: # unspecified or == 0 --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when # the actual number of physical cores is greater than it. # WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU # the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs # can be added to a SB/VM, but the memory footprint will be big. Another example, with # `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of # vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable, # unless you know what are you doing. default_maxvcpus = 0 # Bridges can be used to hot plug devices. # Limitations: # * Currently only pci bridges are supported # * Until 30 devices per bridge can be hot plugged. # * Until 5 PCI bridges can be cold plugged per VM. # This limitation could be a bug in qemu or in the kernel # Default number of bridges per SB/VM: # unspecified or 0 --> will be set to 1 # > 1 <= 5 --> will be set to the specified number # > 5 --> will be set to 5 default_bridges = 1 # Default memory size in MiB for SB/VM. # If unspecified then it will be set 2048 MiB. default_memory = 2048 # # Default memory slots per SB/VM. # If unspecified then it will be set 10. # This is will determine the times that memory will be hotadded to sandbox/VM. #memory_slots = 10 # The size in MiB will be plused to max memory of hypervisor. # It is the memory address space for the NVDIMM devie. # If set block storage driver (block_device_driver) to "nvdimm", # should set memory_offset to the size of block device. # Default 0 #memory_offset = 0 # Disable block device from being used for a container's rootfs. # In case of a storage driver like devicemapper where a container's # root file system is backed by a block device, the block device is passed # directly to the hypervisor for performance reasons. # This flag prevents the block device from being passed to the hypervisor, # 9pfs is used instead to pass the rootfs. disable_block_device_use = false # Shared file system type: # - virtio-9p (default) # - virtio-fs shared_fs = "virtio-9p" # Path to vhost-user-fs daemon. virtio_fs_daemon = "/usr/bin/virtiofsd" # Default size of DAX cache in MiB virtio_fs_cache_size = 1024 # Extra args for virtiofsd daemon # # Format example: # ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"] # # see `virtiofsd -h` for possible options. virtio_fs_extra_args = [] # Cache mode: # # - none # Metadata, data, and pathname lookup are not cached in guest. They are # always fetched from host and any changes are immediately pushed to host. # # - auto # Metadata and pathname lookup cache expires after a configured amount of # time (default is 1 second). Data is cached while the file is open (close # to open consistency). # # - always # Metadata, data, and pathname lookup are cached in guest and never expire. virtio_fs_cache = "always" # Block storage driver to be used for the hypervisor in case the container # rootfs is backed by a block device. This is virtio-scsi, virtio-blk # or nvdimm. block_device_driver = "virtio-scsi" # Specifies cache-related options will be set to block devices or not. # Default false #block_device_cache_set = true # Specifies cache-related options for block devices. # Denotes whether use of O_DIRECT (bypass the host page cache) is enabled. # Default false #block_device_cache_direct = true # Specifies cache-related options for block devices. # Denotes whether flush requests for the device are ignored. # Default false #block_device_cache_noflush = true # Enable iothreads (data-plane) to be used. This causes IO to be # handled in a separate IO thread. This is currently only implemented # for SCSI. # enable_iothreads = false # Enable pre allocation of VM RAM, default false # Enabling this will result in lower container density # as all of the memory will be allocated and locked # This is useful when you want to reserve all the memory # upfront or in the cases where you want memory latencies # to be very predictable # Default false #enable_mem_prealloc = true # Enable huge pages for VM RAM, default false # Enabling this will result in the VM memory # being allocated using huge pages. # This is useful when you want to use vhost-user network # stacks within the container. This will automatically # result in memory pre allocation #enable_hugepages = true # Enable file based guest memory support. The default is an empty string which # will disable this feature. In the case of virtio-fs, this is enabled # automatically and '/dev/shm' is used as the backing folder. # This option will be ignored if VM templating is enabled. #file_mem_backend = "" # Enable swap of vm memory. Default false. # The behaviour is undefined if mem_prealloc is also set to true #enable_swap = true # This option changes the default hypervisor and kernel parameters # to enable debug output where available. This extra output is added # to the proxy logs, but only when proxy debug is also enabled. # # Default false #enable_debug = true # Disable the customizations done in the runtime when it detects # that it is running on top a VMM. This will result in the runtime # behaving as it would when running on bare metal. # #disable_nesting_checks = true # This is the msize used for 9p shares. It is the number of bytes # used for 9p packet payload. #msize_9p = 8192 # If true and vsocks are supported, use vsocks to communicate directly # with the agent and no proxy is started, otherwise use unix # sockets and start a proxy to communicate with the agent. # Default false #use_vsock = true # VFIO devices are hotplugged on a bridge by default. # Enable hotplugging on root bus. This may be required for devices with # a large PCI bar, as this is a current limitation with hotplugging on # a bridge. This value is valid for "pc" machine type. # Default false #hotplug_vfio_on_root_bus = true # If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off # security (vhost-net runs ring0) for network I/O performance. #disable_vhost_net = true # # Default entropy source. # The path to a host source of entropy (including a real hardware RNG) # /dev/urandom and /dev/random are two main options. # Be aware that /dev/random is a blocking source of entropy. If the host # runs out of entropy, the VMs boot time will increase leading to get startup # timeouts. # The source of entropy /dev/urandom is non-blocking and provides a # generally acceptable source of entropy. It should work well for pretty much # all practical purposes. #entropy_source= "/dev/urandom" # Path to OCI hook binaries in the *guest rootfs*. # This does not affect host-side hooks which must instead be added to # the OCI spec passed to the runtime. # # You can create a rootfs with hooks by customizing the osbuilder scripts: # https://github.com/kata-containers/osbuilder # # Hooks must be stored in a subdirectory of guest_hook_path according to their # hook type, i.e. "guest_hook_path/{prestart,postart,poststop}". # The agent will scan these directories for executable files and add them, in # lexicographical order, to the lifecycle of the guest container. # Hooks are executed in the runtime namespace of the guest. See the official documentation: # https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks # Warnings will be logged if any error is encountered will scanning for hooks, # but it will not abort container execution. #guest_hook_path = "/usr/share/oci/hooks" [factory] # VM templating support. Once enabled, new VMs are created from template # using vm cloning. They will share the same initial kernel, initramfs and # agent memory by mapping it readonly. It helps speeding up new container # creation and saves a lot of memory if there are many kata containers running # on the same host. # # When disabled, new VMs are created from scratch. # # Note: Requires "initrd=" to be set ("image=" is not supported). # # Default false #enable_template = true # Specifies the path of template. # # Default "/run/vc/vm/template" #template_path = "/run/vc/vm/template" # The number of caches of VMCache: # unspecified or == 0 --> VMCache is disabled # > 0 --> will be set to the specified number # # VMCache is a function that creates VMs as caches before using it. # It helps speed up new container creation. # The function consists of a server and some clients communicating # through Unix socket. The protocol is gRPC in protocols/cache/cache.proto. # The VMCache server will create some VMs and cache them by factory cache. # It will convert the VM to gRPC format and transport it when gets # requestion from clients. # Factory grpccache is the VMCache client. It will request gRPC format # VM and convert it back to a VM. If VMCache function is enabled, # kata-runtime will request VM from factory grpccache when it creates # a new sandbox. # # Default 0 #vm_cache_number = 0 # Specify the address of the Unix socket that is used by VMCache. # # Default /var/run/kata-containers/cache.sock #vm_cache_endpoint = "/var/run/kata-containers/cache.sock" [proxy.kata] path = "/usr/libexec/kata-containers/kata-proxy" # If enabled, proxy messages will be sent to the system log # (default: disabled) #enable_debug = true [shim.kata] path = "/usr/libexec/kata-containers/kata-shim" # If enabled, shim messages will be sent to the system log # (default: disabled) #enable_debug = true # If enabled, the shim will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # # Note: By default, the shim runs in a separate network namespace. Therefore, # to allow it to send trace details to the Jaeger agent running on the host, # it is necessary to set 'disable_new_netns=true' so that it runs in the host # network namespace. # # (default: disabled) #enable_tracing = true [agent.kata] # If enabled, make the agent display debug-level messages. # (default: disabled) #enable_debug = true # Enable agent tracing. # # If enabled, the default trace mode is "dynamic" and the # default trace type is "isolated". The trace mode and type are set # explicity with the `trace_type=` and `trace_mode=` options. # # Notes: # # - Tracing is ONLY enabled when `enable_tracing` is set: explicitly # setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing` # will NOT activate agent tracing. # # - See https://github.com/kata-containers/agent/blob/master/TRACING.md for # full details. # # (default: disabled) #enable_tracing = true # #trace_mode = "dynamic" #trace_type = "isolated" # Comma separated list of kernel modules and their parameters. # These modules will be loaded in the guest kernel using modprobe(8). # The following example can be used to load two kernel modules with parameters # - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"] # The first word is considered as the module name and the rest as its parameters. # Container will not be started when: # * A kernel module is specified and the modprobe command is not installed in the guest # or it fails loading the module. # * The module is not available in the guest or it doesn't met the guest kernel # requirements, like architecture and version. # kernel_modules=[] [netmon] # If enabled, the network monitoring process gets started when the # sandbox is created. This allows for the detection of some additional # network being added to the existing network namespace, after the # sandbox has been created. # (default: disabled) #enable_netmon = true # Specify the path to the netmon binary. path = "/usr/libexec/kata-containers/kata-netmon" # If enabled, netmon messages will be sent to the system log # (default: disabled) #enable_debug = true [runtime] # If enabled, the runtime will log additional debug messages to the # system log # (default: disabled) #enable_debug = true # # Internetworking model # Determines how the VM should be connected to the # the container network interface # Options: # # - macvtap # Used when the Container network interface can be bridged using # macvtap. # # - none # Used when customize network. Only creates a tap device. No veth pair. # # - tcfilter # Uses tc filter rules to redirect traffic from the network interface # provided by plugin to a tap interface connected to the VM. # internetworking_model="tcfilter" # disable guest seccomp # Determines whether container seccomp profiles are passed to the virtual # machine and applied by the kata agent. If set to true, seccomp is not applied # within the guest # (default: true) disable_guest_seccomp=true # If enabled, the runtime will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # (default: disabled) #enable_tracing = true # If enabled, the runtime will not create a network namespace for shim and hypervisor processes. # This option may have some potential impacts to your host. It should only be used when you know what you're doing. # `disable_new_netns` conflicts with `enable_netmon` # `disable_new_netns` conflicts with `internetworking_model=tcfilter` and `internetworking_model=macvtap`. It works only # with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge # (like OVS) directly. # If you are using docker, `disable_new_netns` only works with `docker run --net=none` # (default: false) #disable_new_netns = true # if enabled, the runtime will add all the kata processes inside one dedicated cgroup. # The container cgroups in the host are not created, just one single cgroup per sandbox. # The sandbox cgroup is not constrained by the runtime # The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox. # The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation. # See: https://godoc.org/github.com/kata-containers/runtime/virtcontainers#ContainerType sandbox_cgroup_only=false # Enabled experimental feature list, format: ["a", "b"]. # Experimental features are features not stable enough for production, # They may break compatibility, and are prepared for a big version bump. # Supported experimental features: # 1. "newstore": new persist storage driver which breaks backward compatibility, # expected to move out of experimental in 2.0.0. # (default: []) experimental=[] ``` --- # KSM throttler ## version Output of "`/usr/libexec/kata-ksm-throttler/kata-ksm-throttler --version`": ``` kata-ksm-throttler version 1.10.2-7d69920 ``` Output of "`/usr/lib/systemd/system/kata-ksm-throttler.service --version`": ``` ./kata-collect-data.sh: line 178: /usr/lib/systemd/system/kata-ksm-throttler.service: Permission denied ``` ## systemd service # Image details ```yaml --- osbuilder: url: "https://github.com/kata-containers/osbuilder" version: "unknown" rootfs-creation-time: "2020-03-19T14:22:02.874378806+0000Z" description: "osbuilder rootfs" file-format-version: "0.0.2" architecture: "x86_64" base-distro: name: "Clear" version: "32640" packages: default: - "chrony" - "iptables-bin" - "kmod-bin" - "libudev0-shim" - "systemd" - "util-linux-bin" extra: agent: url: "https://github.com/kata-containers/agent" name: "kata-agent" version: "1.10.2-9ecf1b6c06059d31bc22d469e80d73a70232be04" agent-is-init-daemon: "no" ``` --- # Initrd details No initrd --- # Logfiles ## Runtime logs Recent runtime problems found in system journal: ``` time="2020-06-29T05:29:27.57784593Z" level=error msg="Invalid command \"kata-collect-data\"" arch=amd64 name=kata-runtime pid=3759572 source=runtime ``` ## Proxy logs No recent proxy problems found in system journal. ## Shim logs No recent shim problems found in system journal. ## Throttler logs No recent throttler problems found in system journal. --- # Container manager details No `docker` Have `kubectl` ## Kubernetes Output of "`kubectl version`": ``` Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"archive", BuildDate:"2020-01-21T20:41:54Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} The connection to the server 127.0.0.1:9999 was refused - did you specify the right host or port? ``` Output of "`kubectl config view`": ``` apiVersion: v1 clusters: [] contexts: [] current-context: "" kind: Config preferences: {} users: [] ``` Output of "`systemctl show kubelet`": ``` Type=simple Restart=on-failure NotifyAccess=none RestartUSec=100ms TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s WatchdogUSec=0 WatchdogTimestamp=Mon 2020-06-29 06:00:44 UTC WatchdogTimestampMonotonic=5377148030930 StartLimitInterval=10000000 StartLimitBurst=5 StartLimitAction=none FailureAction=none PermissionsStartOnly=no RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=3789183 ControlPID=0 FileDescriptorStoreMax=0 StatusErrno=0 Result=success ExecMainStartTimestamp=Mon 2020-06-29 06:00:44 UTC ExecMainStartTimestampMonotonic=5377148030884 ExecMainExitTimestampMonotonic=0 ExecMainPID=3789183 ExecMainCode=0 ExecMainStatus=0 ExecStart={ path=/usr/bin/kubelet ; argv[]=/usr/bin/kubelet $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBELET_ADDRESS $KUBELET_PORT $KUBELET_HOSTNAME $KUBE_ALLOW_PRIV $KUBELET_ARGS ; ignore_errors=no ; start_time=[Mon 2020-06-29 06:00:44 UTC] ; stop_time=[n/a] ; pid=3789183 ; code=(null) ; status=0/0 } Slice=system.slice ControlGroup=/system.slice/kubelet.service MemoryCurrent=46546944 TasksCurrent=65 Delegate=no CPUAccounting=no CPUShares=18446744073709551615 StartupCPUShares=18446744073709551615 CPUQuotaPerSecUSec=infinity BlockIOAccounting=no BlockIOWeight=18446744073709551615 StartupBlockIOWeight=18446744073709551615 MemoryAccounting=no MemoryLimit=18446744073709551615 DevicePolicy=auto TasksAccounting=no TasksMax=18446744073709551615 EnvironmentFile=/etc/sysconfig/kubelet (ignore_errors=yes) UMask=0022 LimitCPU=18446744073709551615 LimitFSIZE=18446744073709551615 LimitDATA=18446744073709551615 LimitSTACK=18446744073709551615 LimitCORE=18446744073709551615 LimitRSS=18446744073709551615 LimitNOFILE=65536 LimitAS=18446744073709551615 LimitNPROC=767294 LimitMEMLOCK=65536 LimitLOCKS=18446744073709551615 LimitSIGPENDING=767294 LimitMSGQUEUE=819200 LimitNICE=0 LimitRTPRIO=0 LimitRTTIME=18446744073709551615 WorkingDirectory=/var/lib/kubelet OOMScoreAdjust=0 Nice=0 IOScheduling=0 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SecureBits=0 CapabilityBoundingSet=18446744073709551615 AmbientCapabilities=0 MountFlags=0 PrivateTmp=no PrivateNetwork=no PrivateDevices=no ProtectHome=no ProtectSystem=no SameProcessGroup=no IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=0 RuntimeDirectoryMode=0755 KillMode=control-group KillSignal=15 SendSIGKILL=yes SendSIGHUP=no Id=kubelet.service Names=kubelet.service Requires=kube-proxy.service basic.target -.mount system.slice containerd.service WantedBy=multi-user.target Conflicts=shutdown.target Before=shutdown.target multi-user.target After=systemd-journald.socket basic.target -.mount system.slice containerd.service RequiresMountsFor=/var/lib/kubelet Documentation=https://github.com/kubernetes/kubernetes Description=Kubernetes Kubelet Server LoadState=loaded ActiveState=active SubState=running FragmentPath=/usr/lib/systemd/system/kubelet.service UnitFileState=enabled UnitFilePreset=disabled InactiveExitTimestamp=Mon 2020-06-29 06:00:44 UTC InactiveExitTimestampMonotonic=5377148030945 ActiveEnterTimestamp=Mon 2020-06-29 06:00:44 UTC ActiveEnterTimestampMonotonic=5377148030945 ActiveExitTimestamp=Mon 2020-06-29 06:00:44 UTC ActiveExitTimestampMonotonic=5377147970746 InactiveEnterTimestamp=Mon 2020-06-29 06:00:44 UTC InactiveEnterTimestampMonotonic=5377147979376 CanStart=yes CanStop=yes CanReload=no CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureJobMode=replace IgnoreOnIsolate=no IgnoreOnSnapshot=no NeedDaemonReload=no JobTimeoutUSec=0 JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Mon 2020-06-29 06:00:44 UTC ConditionTimestampMonotonic=5377148017445 AssertTimestamp=Mon 2020-06-29 06:00:44 UTC AssertTimestampMonotonic=5377148017445 Transient=no ``` No `crio` Have `containerd` ## containerd Output of "`containerd --version`": ``` containerd github.com/containerd/containerd v1.3.3 d76c121f76a5fc8a462dc64594aea72fe18e1178 ``` Output of "`systemctl show containerd`": ``` Type=simple Restart=always NotifyAccess=none RestartUSec=5s TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s WatchdogUSec=0 WatchdogTimestamp=Mon 2020-06-29 06:00:44 UTC WatchdogTimestampMonotonic=5377148015546 StartLimitInterval=10000000 StartLimitBurst=5 StartLimitAction=none FailureAction=none PermissionsStartOnly=no RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=3789181 ControlPID=0 FileDescriptorStoreMax=0 StatusErrno=0 Result=success ExecMainStartTimestamp=Mon 2020-06-29 06:00:44 UTC ExecMainStartTimestampMonotonic=5377148015509 ExecMainExitTimestampMonotonic=0 ExecMainPID=3789181 ExecMainCode=0 ExecMainStatus=0 ExecStartPre={ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; ignore_errors=no ; start_time=[Mon 2020-06-29 06:00:44 UTC] ; stop_time=[Mon 2020-06-29 06:00:44 UTC] ; pid=3789179 ; code=exited ; status=0 } ExecStart={ path=/usr/local/bin/containerd ; argv[]=/usr/local/bin/containerd ; ignore_errors=no ; start_time=[Mon 2020-06-29 06:00:44 UTC] ; stop_time=[n/a] ; pid=3789181 ; code=(null) ; status=0/0 } Slice=system.slice ControlGroup=/system.slice/containerd.service MemoryCurrent=63042719744 TasksCurrent=380 Delegate=yes CPUAccounting=no CPUShares=18446744073709551615 StartupCPUShares=18446744073709551615 CPUQuotaPerSecUSec=infinity BlockIOAccounting=no BlockIOWeight=18446744073709551615 StartupBlockIOWeight=18446744073709551615 MemoryAccounting=no MemoryLimit=18446744073709551615 DevicePolicy=auto TasksAccounting=no TasksMax=18446744073709551615 UMask=0022 LimitCPU=18446744073709551615 LimitFSIZE=18446744073709551615 LimitDATA=18446744073709551615 LimitSTACK=18446744073709551615 LimitCORE=18446744073709551615 LimitRSS=18446744073709551615 LimitNOFILE=1048576 LimitAS=18446744073709551615 LimitNPROC=18446744073709551615 LimitMEMLOCK=65536 LimitLOCKS=18446744073709551615 LimitSIGPENDING=767294 LimitMSGQUEUE=819200 LimitNICE=0 LimitRTPRIO=0 LimitRTTIME=18446744073709551615 OOMScoreAdjust=-999 Nice=0 IOScheduling=0 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SecureBits=0 CapabilityBoundingSet=18446744073709551615 AmbientCapabilities=0 MountFlags=0 PrivateTmp=no PrivateNetwork=no PrivateDevices=no ProtectHome=no ProtectSystem=no SameProcessGroup=no IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=0 RuntimeDirectoryMode=0755 KillMode=process KillSignal=15 SendSIGKILL=yes SendSIGHUP=no Id=containerd.service Names=containerd.service Requires=basic.target system.slice RequiredBy=kubelet.service WantedBy=multi-user.target Conflicts=shutdown.target Before=shutdown.target kubelet.service multi-user.target After=systemd-journald.socket basic.target system.slice network.target Documentation=https://containerd.io Description=containerd container runtime LoadState=loaded ActiveState=active SubState=running FragmentPath=/etc/systemd/system/containerd.service UnitFileState=enabled UnitFilePreset=disabled InactiveExitTimestamp=Mon 2020-06-29 06:00:44 UTC InactiveExitTimestampMonotonic=5377148000779 ActiveEnterTimestamp=Mon 2020-06-29 06:00:44 UTC ActiveEnterTimestampMonotonic=5377148015590 ActiveExitTimestamp=Mon 2020-06-29 06:00:44 UTC ActiveExitTimestampMonotonic=5377147991726 InactiveEnterTimestamp=Mon 2020-06-29 06:00:44 UTC InactiveEnterTimestampMonotonic=5377147999237 CanStart=yes CanStop=yes CanReload=no CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureJobMode=replace IgnoreOnIsolate=no IgnoreOnSnapshot=no NeedDaemonReload=no JobTimeoutUSec=0 JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Mon 2020-06-29 06:00:44 UTC ConditionTimestampMonotonic=5377148000346 AssertTimestamp=Mon 2020-06-29 06:00:44 UTC AssertTimestampMonotonic=5377148000346 Transient=no ``` Output of "`cat /etc/containerd/config.toml`": ``` # Originally version 1 was referred from https://github.com/kata-containers/documentation/blob/master/how-to/containerd-kata.md#configure-containerd-to-use-kata-containers # Used containerd config dump to generate version 2 config # Details of this config is in https://github.com/containerd/cri/blob/master/docs/config.md version = 2 root = "/var/lib/containerd" state = "/run/containerd" plugin_dir = "" disabled_plugins = [] required_plugins = [] # The service unit file already have -999, keep it here for redundant oom_score = -999 [grpc] address = "/run/containerd/containerd.sock" tcp_address = "" tcp_tls_cert = "" tcp_tls_key = "" uid = 0 gid = 0 max_recv_message_size = 16777216 max_send_message_size = 16777216 [ttrpc] address = "" uid = 0 gid = 0 [debug] address = "" uid = 0 gid = 0 level = "info" [metrics] address = "0.0.0.0:1338" grpc_histogram = false [cgroup] path = "" [timeouts] "io.containerd.timeout.shim.cleanup" = "5s" "io.containerd.timeout.shim.load" = "5s" "io.containerd.timeout.shim.shutdown" = "3s" "io.containerd.timeout.task.state" = "2s" [plugins] [plugins."io.containerd.gc.v1.scheduler"] pause_threshold = 0.02 deletion_threshold = 0 mutation_threshold = 100 schedule_delay = "0s" startup_delay = "100ms" [plugins."io.containerd.grpc.v1.cri"] disable_tcp_service = true stream_server_address = "127.0.0.1" stream_server_port = "0" stream_idle_timeout = "4h0m0s" enable_selinux = false sandbox_image = "docker.ouroath.com:4443/yahoo-cloud/k8s.gcr.io/pause:3.1" stats_collect_period = 10 enable_tls_streaming = false max_container_log_line_size = 16384 disable_cgroup = false disable_apparmor = false restrict_oom_score_adj = false max_concurrent_downloads = 3 disable_proc_mount = false [plugins."io.containerd.grpc.v1.cri".containerd] snapshotter = "devmapper" default_runtime_name = "runc" no_pivot = false [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata] runtime_type = "io.containerd.kata.v2" privileged_without_host_devices = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options] ConfigPath = "/etc/kata-containers/configuration.toml" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] runtime_type = "io.containerd.runc.v2" privileged_without_host_devices = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] NoPivotRoot = false NoNewKeyring = false ShimCgroup = "" IoUid = 0 IoGid = 0 BinaryName = "" Root = "" CriuPath = "" SystemdCgroup = true CriuImagePath = "" CriuWorkPath = "" [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/opt/cni/bin" conf_dir = "/etc/cni/net.d" max_conf_num = 1 conf_template = "" [plugins."io.containerd.grpc.v1.cri".registry] [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://registry-1.docker.io"] [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming] tls_cert_file = "" tls_key_file = "" [plugins."io.containerd.internal.v1.opt"] path = "/opt/containerd" [plugins."io.containerd.internal.v1.restart"] interval = "10s" [plugins."io.containerd.metadata.v1.bolt"] content_sharing_policy = "shared" [plugins."io.containerd.monitor.v1.cgroups"] no_prometheus = false [plugins."io.containerd.runtime.v1.linux"] shim = "containerd-shim" runtime = "runc" runtime_root = "" no_shim = false shim_debug = false systemd_cgroup = true [plugins."io.containerd.runtime.v2.task"] platforms = ["linux/amd64"] [plugins."io.containerd.service.v1.diff-service"] default = ["walking"] [plugins."io.containerd.snapshotter.v1.devmapper"] root_path = "" pool_name = "sys-docker--pool" base_image_size = "100GB" ``` --- # Packages No `dpkg` Have `rpm` Output of "`rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"`": ``` kata-containers-image-1.10.2-6.1.x86_64 kata-linux-container-4.19.86.60-7.1.x86_64 kata-shim-bin-1.10.2-6.1.x86_64 qemu-vanilla-4.1.0+git.9e06029aea-7.1.x86_64 kata-ksm-throttler-1.10.2-6.1.x86_64 kata-proxy-1.10.2-6.1.x86_64 kata-shim-1.10.2-6.1.x86_64 qemu-vanilla-data-4.1.0+git.9e06029aea-7.1.x86_64 qemu-vanilla-bin-4.1.0+git.9e06029aea-7.1.x86_64 kata-proxy-bin-1.10.2-6.1.x86_64 kata-runtime-1.10.2-6.1.x86_64 ``` ---

parthasl commented 4 years ago

When I set default_vcpus = 4, htop listed 8 cpus [ 4 k8s def + 4 default cpus]. Running stress --cpu 4 --timeout 60s shows usage for all 4 cpus.

devimc commented 4 years ago

Running kata-collect-data.sh version 1.11.1 (commit 984ccea48a1badd2863a68f91c8d95e229898095) at 2020-06-29.06:00:53.076170759+0000. kata-containers-image-1.10.2-6.1.x86_64 kata-linux-container-4.19.86.60-7.1.x86_64

uhmm.. kata 1.11.1 and 1.10.2 ? could you enable full debug, try again with the latest stable (1.11.1) and run kata-collect-data.sh?

parthasl commented 4 years ago

kata-collect-data.sh 1.11.1

Show kata-collect-data.sh details

# Meta details Running `kata-collect-data.sh` version `1.11.1 (commit 7b6f3b64873bbeed63f2f7937d83f1ed6ccd258a)` at `2020-06-29.19:41:25.653689313+0000`. --- Runtime is `/usr/bin/kata-runtime`. # `kata-env` Output of "`/usr/bin/kata-runtime kata-env`": ```toml [Meta] Version = "1.0.24" [Runtime] Debug = true Trace = false DisableGuestSeccomp = true DisableNewNetNs = false SandboxCgroupOnly = false Path = "/usr/bin/kata-runtime" [Runtime.Version] OCI = "1.0.1-dev" [Runtime.Version.Version] Semver = "1.11.1" Major = 1 Minor = 11 Patch = 1 Commit = "7b6f3b64873bbeed63f2f7937d83f1ed6ccd258a" [Runtime.Config] Path = "/etc/kata-containers/configuration.toml" [Hypervisor] MachineType = "pc" Version = "QEMU emulator version 4.1.1\nCopyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers" Path = "/usr/bin/qemu-vanilla-system-x86_64" BlockDeviceDriver = "virtio-scsi" EntropySource = "/dev/urandom" SharedFS = "virtio-9p" VirtioFSDaemon = "/usr/bin/virtiofsd" Msize9p = 8192 MemorySlots = 10 PCIeRootPort = 0 HotplugVFIOOnRootBus = false Debug = true UseVSock = false [Image] Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.11.1_agent_34a000d0cf.img" [Kernel] Path = "/usr/share/kata-containers/vmlinuz-5.4.32.73-5.1.container" Parameters = "systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none agent.log=debug vsyscall=emulate init=/usr/bin/kata-agent agent.log=debug initcall_debug" [Initrd] Path = "" [Proxy] Type = "kataProxy" Path = "/usr/libexec/kata-containers/kata-proxy" Debug = true [Proxy.Version] Semver = "1.11.1-f61beff" Major = 1 Minor = 11 Patch = 1 Commit = "<>" [Shim] Type = "kataShim" Path = "/usr/libexec/kata-containers/kata-shim" Debug = true [Shim.Version] Semver = "1.11.1-9938269" Major = 1 Minor = 11 Patch = 1 Commit = "<>" [Agent] Type = "kata" Debug = true Trace = false TraceMode = "" TraceType = "" [Host] Kernel = "3.10.0-1062.12.1.el7.YAHOO.20200205.52.x86_64" Architecture = "amd64" VMContainerCapable = true SupportVSocks = true [Host.Distro] Name = "Red Hat Enterprise Linux Server" Version = "7.7" [Host.CPU] Vendor = "GenuineIntel" Model = "Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz" [Netmon] Path = "/usr/libexec/kata-containers/kata-netmon" Debug = true Enable = false [Netmon.Version] Semver = "1.11.1" Major = 1 Minor = 11 Patch = 1 Commit = "<>" ``` --- # Runtime config files ## Runtime default config files ``` /etc/kata-containers/configuration.toml /usr/share/defaults/kata-containers/configuration.toml ``` ## Runtime config file contents Output of "`cat "/etc/kata-containers/configuration.toml"`": ```toml # Copyright (c) 2017-2019 Intel Corporation # # SPDX-License-Identifier: Apache-2.0 # # XXX: WARNING: this file is auto-generated. # XXX: # XXX: Source file: "cli/config/configuration-qemu.toml.in" # XXX: Project: # XXX: Name: Kata Containers # XXX: Type: kata [hypervisor.qemu] path = "/usr/bin/qemu-vanilla-system-x86_64" kernel = "/usr/share/kata-containers/vmlinuz.container" image = "/usr/share/kata-containers/kata-containers.img" machine_type = "pc" # Optional space-separated list of options to pass to the guest kernel. # For example, use `kernel_params = "vsyscall=emulate"` if you are having # trouble running pre-2.15 glibc. # # WARNING: - any parameter specified here will take priority over the default # parameter value of the same name used to start the virtual machine. # Do not set values here unless you understand the impact of doing so as you # may stop the virtual machine from booting. # To see the list of default parameters, enable hypervisor debug, create a # container and look for 'default-kernel-parameters' log entries. kernel_params = "vsyscall=emulate init=/usr/bin/kata-agent agent.log=debug initcall_debug" # Path to the firmware. # If you want that qemu uses the default firmware leave this option empty firmware = "" # Machine accelerators # comma-separated list of machine accelerators to pass to the hypervisor. # For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"` machine_accelerators="" # Default number of vCPUs per SB/VM: # unspecified or 0 --> will be set to 1 # < 0 --> will be set to the actual number of physical cores # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores default_vcpus = 1 # Default maximum number of vCPUs per SB/VM: # unspecified or == 0 --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when # the actual number of physical cores is greater than it. # WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU # the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs # can be added to a SB/VM, but the memory footprint will be big. Another example, with # `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of # vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable, # unless you know what are you doing. default_maxvcpus = 0 # Bridges can be used to hot plug devices. # Limitations: # * Currently only pci bridges are supported # * Until 30 devices per bridge can be hot plugged. # * Until 5 PCI bridges can be cold plugged per VM. # This limitation could be a bug in qemu or in the kernel # Default number of bridges per SB/VM: # unspecified or 0 --> will be set to 1 # > 1 <= 5 --> will be set to the specified number # > 5 --> will be set to 5 default_bridges = 1 # Default memory size in MiB for SB/VM. # If unspecified then it will be set 2048 MiB. default_memory = 2048 # # Default memory slots per SB/VM. # If unspecified then it will be set 10. # This is will determine the times that memory will be hotadded to sandbox/VM. #memory_slots = 10 # The size in MiB will be plused to max memory of hypervisor. # It is the memory address space for the NVDIMM devie. # If set block storage driver (block_device_driver) to "nvdimm", # should set memory_offset to the size of block device. # Default 0 #memory_offset = 0 # Specifies virtio-mem will be enabled or not. # Please note that this option should be used with the command # "echo 1 > /proc/sys/vm/overcommit_memory". # Default false #enable_virtio_mem = true # Disable block device from being used for a container's rootfs. # In case of a storage driver like devicemapper where a container's # root file system is backed by a block device, the block device is passed # directly to the hypervisor for performance reasons. # This flag prevents the block device from being passed to the hypervisor, # 9pfs is used instead to pass the rootfs. disable_block_device_use = false # Shared file system type: # - virtio-9p (default) # - virtio-fs shared_fs = "virtio-9p" # Path to vhost-user-fs daemon. virtio_fs_daemon = "/usr/bin/virtiofsd" # Default size of DAX cache in MiB virtio_fs_cache_size = 1024 # Extra args for virtiofsd daemon # # Format example: # ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"] # # see `virtiofsd -h` for possible options. virtio_fs_extra_args = [] # Cache mode: # # - none # Metadata, data, and pathname lookup are not cached in guest. They are # always fetched from host and any changes are immediately pushed to host. # # - auto # Metadata and pathname lookup cache expires after a configured amount of # time (default is 1 second). Data is cached while the file is open (close # to open consistency). # # - always # Metadata, data, and pathname lookup are cached in guest and never expire. virtio_fs_cache = "always" # Block storage driver to be used for the hypervisor in case the container # rootfs is backed by a block device. This is virtio-scsi, virtio-blk # or nvdimm. block_device_driver = "virtio-scsi" # Specifies cache-related options will be set to block devices or not. # Default false #block_device_cache_set = true # Specifies cache-related options for block devices. # Denotes whether use of O_DIRECT (bypass the host page cache) is enabled. # Default false #block_device_cache_direct = true # Specifies cache-related options for block devices. # Denotes whether flush requests for the device are ignored. # Default false #block_device_cache_noflush = true # Enable iothreads (data-plane) to be used. This causes IO to be # handled in a separate IO thread. This is currently only implemented # for SCSI. # enable_iothreads = false # Enable pre allocation of VM RAM, default false # Enabling this will result in lower container density # as all of the memory will be allocated and locked # This is useful when you want to reserve all the memory # upfront or in the cases where you want memory latencies # to be very predictable # Default false #enable_mem_prealloc = true # Enable huge pages for VM RAM, default false # Enabling this will result in the VM memory # being allocated using huge pages. # This is useful when you want to use vhost-user network # stacks within the container. This will automatically # result in memory pre allocation #enable_hugepages = true # Enable vhost-user storage device, default false # Enabling this will result in some Linux reserved block type # major range 240-254 being chosen to represent vhost-user devices. enable_vhost_user_store = false # The base directory specifically used for vhost-user devices. # Its sub-path "block" is used for block devices; "block/sockets" is # where we expect vhost-user sockets to live; "block/devices" is where # simulated block device nodes for vhost-user devices to live. vhost_user_store_path = "/var/run/kata-containers/vhost-user" # Enable file based guest memory support. The default is an empty string which # will disable this feature. In the case of virtio-fs, this is enabled # automatically and '/dev/shm' is used as the backing folder. # This option will be ignored if VM templating is enabled. #file_mem_backend = "" # Enable swap of vm memory. Default false. # The behaviour is undefined if mem_prealloc is also set to true #enable_swap = true # This option changes the default hypervisor and kernel parameters # to enable debug output where available. This extra output is added # to the proxy logs, but only when proxy debug is also enabled. # # Default false enable_debug = true # Disable the customizations done in the runtime when it detects # that it is running on top a VMM. This will result in the runtime # behaving as it would when running on bare metal. # #disable_nesting_checks = true # This is the msize used for 9p shares. It is the number of bytes # used for 9p packet payload. #msize_9p = 8192 # If true and vsocks are supported, use vsocks to communicate directly # with the agent and no proxy is started, otherwise use unix # sockets and start a proxy to communicate with the agent. # Default false #use_vsock = true # If false and nvdimm is supported, use nvdimm device to plug guest image. # Otherwise virtio-block device is used. # Default is false #disable_image_nvdimm = true # VFIO devices are hotplugged on a bridge by default. # Enable hotplugging on root bus. This may be required for devices with # a large PCI bar, as this is a current limitation with hotplugging on # a bridge. This value is valid for "pc" machine type. # Default false #hotplug_vfio_on_root_bus = true # Before hot plugging a PCIe device, you need to add a pcie_root_port device. # Use this parameter when using some large PCI bar devices, such as Nvidia GPU # The value means the number of pcie_root_port # This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35" # Default 0 #pcie_root_port = 2 # If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off # security (vhost-net runs ring0) for network I/O performance. #disable_vhost_net = true # # Default entropy source. # The path to a host source of entropy (including a real hardware RNG) # /dev/urandom and /dev/random are two main options. # Be aware that /dev/random is a blocking source of entropy. If the host # runs out of entropy, the VMs boot time will increase leading to get startup # timeouts. # The source of entropy /dev/urandom is non-blocking and provides a # generally acceptable source of entropy. It should work well for pretty much # all practical purposes. #entropy_source= "/dev/urandom" # Path to OCI hook binaries in the *guest rootfs*. # This does not affect host-side hooks which must instead be added to # the OCI spec passed to the runtime. # # You can create a rootfs with hooks by customizing the osbuilder scripts: # https://github.com/kata-containers/osbuilder # # Hooks must be stored in a subdirectory of guest_hook_path according to their # hook type, i.e. "guest_hook_path/{prestart,postart,poststop}". # The agent will scan these directories for executable files and add them, in # lexicographical order, to the lifecycle of the guest container. # Hooks are executed in the runtime namespace of the guest. See the official documentation: # https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks # Warnings will be logged if any error is encountered will scanning for hooks, # but it will not abort container execution. #guest_hook_path = "/usr/share/oci/hooks" [factory] # VM templating support. Once enabled, new VMs are created from template # using vm cloning. They will share the same initial kernel, initramfs and # agent memory by mapping it readonly. It helps speeding up new container # creation and saves a lot of memory if there are many kata containers running # on the same host. # # When disabled, new VMs are created from scratch. # # Note: Requires "initrd=" to be set ("image=" is not supported). # # Default false #enable_template = true # Specifies the path of template. # # Default "/run/vc/vm/template" #template_path = "/run/vc/vm/template" # The number of caches of VMCache: # unspecified or == 0 --> VMCache is disabled # > 0 --> will be set to the specified number # # VMCache is a function that creates VMs as caches before using it. # It helps speed up new container creation. # The function consists of a server and some clients communicating # through Unix socket. The protocol is gRPC in protocols/cache/cache.proto. # The VMCache server will create some VMs and cache them by factory cache. # It will convert the VM to gRPC format and transport it when gets # requestion from clients. # Factory grpccache is the VMCache client. It will request gRPC format # VM and convert it back to a VM. If VMCache function is enabled, # kata-runtime will request VM from factory grpccache when it creates # a new sandbox. # # Default 0 #vm_cache_number = 0 # Specify the address of the Unix socket that is used by VMCache. # # Default /var/run/kata-containers/cache.sock #vm_cache_endpoint = "/var/run/kata-containers/cache.sock" [proxy.kata] path = "/usr/libexec/kata-containers/kata-proxy" # If enabled, proxy messages will be sent to the system log # (default: disabled) enable_debug = true [shim.kata] path = "/usr/libexec/kata-containers/kata-shim" # If enabled, shim messages will be sent to the system log # (default: disabled) enable_debug = true # If enabled, the shim will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # # Note: By default, the shim runs in a separate network namespace. Therefore, # to allow it to send trace details to the Jaeger agent running on the host, # it is necessary to set 'disable_new_netns=true' so that it runs in the host # network namespace. # # (default: disabled) #enable_tracing = true [agent.kata] # If enabled, make the agent display debug-level messages. # (default: disabled) enable_debug = true # Enable agent tracing. # # If enabled, the default trace mode is "dynamic" and the # default trace type is "isolated". The trace mode and type are set # explicity with the `trace_type=` and `trace_mode=` options. # # Notes: # # - Tracing is ONLY enabled when `enable_tracing` is set: explicitly # setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing` # will NOT activate agent tracing. # # - See https://github.com/kata-containers/agent/blob/master/TRACING.md for # full details. # # (default: disabled) #enable_tracing = true # #trace_mode = "dynamic" #trace_type = "isolated" # Comma separated list of kernel modules and their parameters. # These modules will be loaded in the guest kernel using modprobe(8). # The following example can be used to load two kernel modules with parameters # - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"] # The first word is considered as the module name and the rest as its parameters. # Container will not be started when: # * A kernel module is specified and the modprobe command is not installed in the guest # or it fails loading the module. # * The module is not available in the guest or it doesn't met the guest kernel # requirements, like architecture and version. # kernel_modules=[] [netmon] # If enabled, the network monitoring process gets started when the # sandbox is created. This allows for the detection of some additional # network being added to the existing network namespace, after the # sandbox has been created. # (default: disabled) #enable_netmon = true # Specify the path to the netmon binary. path = "/usr/libexec/kata-containers/kata-netmon" # If enabled, netmon messages will be sent to the system log # (default: disabled) enable_debug = true [runtime] # If enabled, the runtime will log additional debug messages to the # system log # (default: disabled) enable_debug = true # # Internetworking model # Determines how the VM should be connected to the # the container network interface # Options: # # - macvtap # Used when the Container network interface can be bridged using # macvtap. # # - none # Used when customize network. Only creates a tap device. No veth pair. # # - tcfilter # Uses tc filter rules to redirect traffic from the network interface # provided by plugin to a tap interface connected to the VM. # internetworking_model="tcfilter" # disable guest seccomp # Determines whether container seccomp profiles are passed to the virtual # machine and applied by the kata agent. If set to true, seccomp is not applied # within the guest # (default: true) disable_guest_seccomp=true # If enabled, the runtime will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # (default: disabled) #enable_tracing = true # If enabled, the runtime will not create a network namespace for shim and hypervisor processes. # This option may have some potential impacts to your host. It should only be used when you know what you're doing. # `disable_new_netns` conflicts with `enable_netmon` # `disable_new_netns` conflicts with `internetworking_model=tcfilter` and `internetworking_model=macvtap`. It works only # with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge # (like OVS) directly. # If you are using docker, `disable_new_netns` only works with `docker run --net=none` # (default: false) #disable_new_netns = true # if enabled, the runtime will add all the kata processes inside one dedicated cgroup. # The container cgroups in the host are not created, just one single cgroup per sandbox. # The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox. # The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation. # The sandbox cgroup is constrained if there is no container type annotation. # See: https://godoc.org/github.com/kata-containers/runtime/virtcontainers#ContainerType sandbox_cgroup_only=false # Enabled experimental feature list, format: ["a", "b"]. # Experimental features are features not stable enough for production, # they may break compatibility, and are prepared for a big version bump. # Supported experimental features: # (default: []) experimental=[] ``` Output of "`cat "/usr/share/defaults/kata-containers/configuration.toml"`": ```toml # Copyright (c) 2017-2019 Intel Corporation # # SPDX-License-Identifier: Apache-2.0 # # XXX: WARNING: this file is auto-generated. # XXX: # XXX: Source file: "cli/config/configuration-qemu.toml.in" # XXX: Project: # XXX: Name: Kata Containers # XXX: Type: kata [hypervisor.qemu] path = "/usr/bin/qemu-vanilla-system-x86_64" kernel = "/usr/share/kata-containers/vmlinuz.container" image = "/usr/share/kata-containers/kata-containers.img" machine_type = "pc" # Optional space-separated list of options to pass to the guest kernel. # For example, use `kernel_params = "vsyscall=emulate"` if you are having # trouble running pre-2.15 glibc. # # WARNING: - any parameter specified here will take priority over the default # parameter value of the same name used to start the virtual machine. # Do not set values here unless you understand the impact of doing so as you # may stop the virtual machine from booting. # To see the list of default parameters, enable hypervisor debug, create a # container and look for 'default-kernel-parameters' log entries. kernel_params = "" # Path to the firmware. # If you want that qemu uses the default firmware leave this option empty firmware = "" # Machine accelerators # comma-separated list of machine accelerators to pass to the hypervisor. # For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"` machine_accelerators="" # Default number of vCPUs per SB/VM: # unspecified or 0 --> will be set to 1 # < 0 --> will be set to the actual number of physical cores # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores default_vcpus = 1 # Default maximum number of vCPUs per SB/VM: # unspecified or == 0 --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when # the actual number of physical cores is greater than it. # WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU # the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs # can be added to a SB/VM, but the memory footprint will be big. Another example, with # `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of # vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable, # unless you know what are you doing. default_maxvcpus = 0 # Bridges can be used to hot plug devices. # Limitations: # * Currently only pci bridges are supported # * Until 30 devices per bridge can be hot plugged. # * Until 5 PCI bridges can be cold plugged per VM. # This limitation could be a bug in qemu or in the kernel # Default number of bridges per SB/VM: # unspecified or 0 --> will be set to 1 # > 1 <= 5 --> will be set to the specified number # > 5 --> will be set to 5 default_bridges = 1 # Default memory size in MiB for SB/VM. # If unspecified then it will be set 2048 MiB. default_memory = 2048 # # Default memory slots per SB/VM. # If unspecified then it will be set 10. # This is will determine the times that memory will be hotadded to sandbox/VM. #memory_slots = 10 # The size in MiB will be plused to max memory of hypervisor. # It is the memory address space for the NVDIMM devie. # If set block storage driver (block_device_driver) to "nvdimm", # should set memory_offset to the size of block device. # Default 0 #memory_offset = 0 # Specifies virtio-mem will be enabled or not. # Please note that this option should be used with the command # "echo 1 > /proc/sys/vm/overcommit_memory". # Default false #enable_virtio_mem = true # Disable block device from being used for a container's rootfs. # In case of a storage driver like devicemapper where a container's # root file system is backed by a block device, the block device is passed # directly to the hypervisor for performance reasons. # This flag prevents the block device from being passed to the hypervisor, # 9pfs is used instead to pass the rootfs. disable_block_device_use = false # Shared file system type: # - virtio-9p (default) # - virtio-fs shared_fs = "virtio-9p" # Path to vhost-user-fs daemon. virtio_fs_daemon = "/usr/bin/virtiofsd" # Default size of DAX cache in MiB virtio_fs_cache_size = 1024 # Extra args for virtiofsd daemon # # Format example: # ["-o", "arg1=xxx,arg2", "-o", "hello world", "--arg3=yyy"] # # see `virtiofsd -h` for possible options. virtio_fs_extra_args = [] # Cache mode: # # - none # Metadata, data, and pathname lookup are not cached in guest. They are # always fetched from host and any changes are immediately pushed to host. # # - auto # Metadata and pathname lookup cache expires after a configured amount of # time (default is 1 second). Data is cached while the file is open (close # to open consistency). # # - always # Metadata, data, and pathname lookup are cached in guest and never expire. virtio_fs_cache = "always" # Block storage driver to be used for the hypervisor in case the container # rootfs is backed by a block device. This is virtio-scsi, virtio-blk # or nvdimm. block_device_driver = "virtio-scsi" # Specifies cache-related options will be set to block devices or not. # Default false #block_device_cache_set = true # Specifies cache-related options for block devices. # Denotes whether use of O_DIRECT (bypass the host page cache) is enabled. # Default false #block_device_cache_direct = true # Specifies cache-related options for block devices. # Denotes whether flush requests for the device are ignored. # Default false #block_device_cache_noflush = true # Enable iothreads (data-plane) to be used. This causes IO to be # handled in a separate IO thread. This is currently only implemented # for SCSI. # enable_iothreads = false # Enable pre allocation of VM RAM, default false # Enabling this will result in lower container density # as all of the memory will be allocated and locked # This is useful when you want to reserve all the memory # upfront or in the cases where you want memory latencies # to be very predictable # Default false #enable_mem_prealloc = true # Enable huge pages for VM RAM, default false # Enabling this will result in the VM memory # being allocated using huge pages. # This is useful when you want to use vhost-user network # stacks within the container. This will automatically # result in memory pre allocation #enable_hugepages = true # Enable vhost-user storage device, default false # Enabling this will result in some Linux reserved block type # major range 240-254 being chosen to represent vhost-user devices. enable_vhost_user_store = false # The base directory specifically used for vhost-user devices. # Its sub-path "block" is used for block devices; "block/sockets" is # where we expect vhost-user sockets to live; "block/devices" is where # simulated block device nodes for vhost-user devices to live. vhost_user_store_path = "/var/run/kata-containers/vhost-user" # Enable file based guest memory support. The default is an empty string which # will disable this feature. In the case of virtio-fs, this is enabled # automatically and '/dev/shm' is used as the backing folder. # This option will be ignored if VM templating is enabled. #file_mem_backend = "" # Enable swap of vm memory. Default false. # The behaviour is undefined if mem_prealloc is also set to true #enable_swap = true # This option changes the default hypervisor and kernel parameters # to enable debug output where available. This extra output is added # to the proxy logs, but only when proxy debug is also enabled. # # Default false #enable_debug = true # Disable the customizations done in the runtime when it detects # that it is running on top a VMM. This will result in the runtime # behaving as it would when running on bare metal. # #disable_nesting_checks = true # This is the msize used for 9p shares. It is the number of bytes # used for 9p packet payload. #msize_9p = 8192 # If true and vsocks are supported, use vsocks to communicate directly # with the agent and no proxy is started, otherwise use unix # sockets and start a proxy to communicate with the agent. # Default false #use_vsock = true # If false and nvdimm is supported, use nvdimm device to plug guest image. # Otherwise virtio-block device is used. # Default is false #disable_image_nvdimm = true # VFIO devices are hotplugged on a bridge by default. # Enable hotplugging on root bus. This may be required for devices with # a large PCI bar, as this is a current limitation with hotplugging on # a bridge. This value is valid for "pc" machine type. # Default false #hotplug_vfio_on_root_bus = true # Before hot plugging a PCIe device, you need to add a pcie_root_port device. # Use this parameter when using some large PCI bar devices, such as Nvidia GPU # The value means the number of pcie_root_port # This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35" # Default 0 #pcie_root_port = 2 # If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off # security (vhost-net runs ring0) for network I/O performance. #disable_vhost_net = true # # Default entropy source. # The path to a host source of entropy (including a real hardware RNG) # /dev/urandom and /dev/random are two main options. # Be aware that /dev/random is a blocking source of entropy. If the host # runs out of entropy, the VMs boot time will increase leading to get startup # timeouts. # The source of entropy /dev/urandom is non-blocking and provides a # generally acceptable source of entropy. It should work well for pretty much # all practical purposes. #entropy_source= "/dev/urandom" # Path to OCI hook binaries in the *guest rootfs*. # This does not affect host-side hooks which must instead be added to # the OCI spec passed to the runtime. # # You can create a rootfs with hooks by customizing the osbuilder scripts: # https://github.com/kata-containers/osbuilder # # Hooks must be stored in a subdirectory of guest_hook_path according to their # hook type, i.e. "guest_hook_path/{prestart,postart,poststop}". # The agent will scan these directories for executable files and add them, in # lexicographical order, to the lifecycle of the guest container. # Hooks are executed in the runtime namespace of the guest. See the official documentation: # https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks # Warnings will be logged if any error is encountered will scanning for hooks, # but it will not abort container execution. #guest_hook_path = "/usr/share/oci/hooks" [factory] # VM templating support. Once enabled, new VMs are created from template # using vm cloning. They will share the same initial kernel, initramfs and # agent memory by mapping it readonly. It helps speeding up new container # creation and saves a lot of memory if there are many kata containers running # on the same host. # # When disabled, new VMs are created from scratch. # # Note: Requires "initrd=" to be set ("image=" is not supported). # # Default false #enable_template = true # Specifies the path of template. # # Default "/run/vc/vm/template" #template_path = "/run/vc/vm/template" # The number of caches of VMCache: # unspecified or == 0 --> VMCache is disabled # > 0 --> will be set to the specified number # # VMCache is a function that creates VMs as caches before using it. # It helps speed up new container creation. # The function consists of a server and some clients communicating # through Unix socket. The protocol is gRPC in protocols/cache/cache.proto. # The VMCache server will create some VMs and cache them by factory cache. # It will convert the VM to gRPC format and transport it when gets # requestion from clients. # Factory grpccache is the VMCache client. It will request gRPC format # VM and convert it back to a VM. If VMCache function is enabled, # kata-runtime will request VM from factory grpccache when it creates # a new sandbox. # # Default 0 #vm_cache_number = 0 # Specify the address of the Unix socket that is used by VMCache. # # Default /var/run/kata-containers/cache.sock #vm_cache_endpoint = "/var/run/kata-containers/cache.sock" [proxy.kata] path = "/usr/libexec/kata-containers/kata-proxy" # If enabled, proxy messages will be sent to the system log # (default: disabled) #enable_debug = true [shim.kata] path = "/usr/libexec/kata-containers/kata-shim" # If enabled, shim messages will be sent to the system log # (default: disabled) #enable_debug = true # If enabled, the shim will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # # Note: By default, the shim runs in a separate network namespace. Therefore, # to allow it to send trace details to the Jaeger agent running on the host, # it is necessary to set 'disable_new_netns=true' so that it runs in the host # network namespace. # # (default: disabled) #enable_tracing = true [agent.kata] # If enabled, make the agent display debug-level messages. # (default: disabled) #enable_debug = true # Enable agent tracing. # # If enabled, the default trace mode is "dynamic" and the # default trace type is "isolated". The trace mode and type are set # explicity with the `trace_type=` and `trace_mode=` options. # # Notes: # # - Tracing is ONLY enabled when `enable_tracing` is set: explicitly # setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing` # will NOT activate agent tracing. # # - See https://github.com/kata-containers/agent/blob/master/TRACING.md for # full details. # # (default: disabled) #enable_tracing = true # #trace_mode = "dynamic" #trace_type = "isolated" # Comma separated list of kernel modules and their parameters. # These modules will be loaded in the guest kernel using modprobe(8). # The following example can be used to load two kernel modules with parameters # - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"] # The first word is considered as the module name and the rest as its parameters. # Container will not be started when: # * A kernel module is specified and the modprobe command is not installed in the guest # or it fails loading the module. # * The module is not available in the guest or it doesn't met the guest kernel # requirements, like architecture and version. # kernel_modules=[] [netmon] # If enabled, the network monitoring process gets started when the # sandbox is created. This allows for the detection of some additional # network being added to the existing network namespace, after the # sandbox has been created. # (default: disabled) #enable_netmon = true # Specify the path to the netmon binary. path = "/usr/libexec/kata-containers/kata-netmon" # If enabled, netmon messages will be sent to the system log # (default: disabled) #enable_debug = true [runtime] # If enabled, the runtime will log additional debug messages to the # system log # (default: disabled) #enable_debug = true # # Internetworking model # Determines how the VM should be connected to the # the container network interface # Options: # # - macvtap # Used when the Container network interface can be bridged using # macvtap. # # - none # Used when customize network. Only creates a tap device. No veth pair. # # - tcfilter # Uses tc filter rules to redirect traffic from the network interface # provided by plugin to a tap interface connected to the VM. # internetworking_model="tcfilter" # disable guest seccomp # Determines whether container seccomp profiles are passed to the virtual # machine and applied by the kata agent. If set to true, seccomp is not applied # within the guest # (default: true) disable_guest_seccomp=true # If enabled, the runtime will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # (default: disabled) #enable_tracing = true # If enabled, the runtime will not create a network namespace for shim and hypervisor processes. # This option may have some potential impacts to your host. It should only be used when you know what you're doing. # `disable_new_netns` conflicts with `enable_netmon` # `disable_new_netns` conflicts with `internetworking_model=tcfilter` and `internetworking_model=macvtap`. It works only # with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge # (like OVS) directly. # If you are using docker, `disable_new_netns` only works with `docker run --net=none` # (default: false) #disable_new_netns = true # if enabled, the runtime will add all the kata processes inside one dedicated cgroup. # The container cgroups in the host are not created, just one single cgroup per sandbox. # The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox. # The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation. # The sandbox cgroup is constrained if there is no container type annotation. # See: https://godoc.org/github.com/kata-containers/runtime/virtcontainers#ContainerType sandbox_cgroup_only=false # Enabled experimental feature list, format: ["a", "b"]. # Experimental features are features not stable enough for production, # they may break compatibility, and are prepared for a big version bump. # Supported experimental features: # (default: []) experimental=[] ``` --- # KSM throttler ## version Output of "`/usr/libexec/kata-ksm-throttler/kata-ksm-throttler --version`": ``` kata-ksm-throttler version 1.11.1-60526f8 ``` Output of "`/usr/lib/systemd/system/kata-ksm-throttler.service --version`": ``` ./kata-collect-data.sh: line 178: /usr/lib/systemd/system/kata-ksm-throttler.service: Permission denied ``` ## systemd service # Image details ```yaml --- osbuilder: url: "https://github.com/kata-containers/osbuilder" version: "unknown" rootfs-creation-time: "2020-06-09T05:36:02.623014927+0000Z" description: "osbuilder rootfs" file-format-version: "0.0.2" architecture: "x86_64" base-distro: name: "Clear" version: "33320" packages: default: - "chrony" - "iptables-bin" - "kmod-bin" - "libudev0-shim" - "systemd" - "util-linux-bin" extra: agent: url: "https://github.com/kata-containers/agent" name: "kata-agent" version: "1.11.1-34a000d0cf10c2e6bf1b46cd2bbfaa911de62679" agent-is-init-daemon: "no" ``` --- # Initrd details No initrd --- # Logfiles ## Runtime logs Recent runtime problems found in system journal: ``` time="2020-06-29T05:29:27.57784593Z" level=error msg="Invalid command \"kata-collect-data\"" arch=amd64 name=kata-runtime pid=3759572 source=runtime ``` ## Proxy logs No recent proxy problems found in system journal. ## Shim logs No recent shim problems found in system journal. ## Throttler logs No recent throttler problems found in system journal. --- # Container manager details No `docker` Have `kubectl` ## Kubernetes Output of "`kubectl version`": ``` Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"archive", BuildDate:"2020-01-21T20:41:54Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"} The connection to the server 127.0.0.1:9999 was refused - did you specify the right host or port? ``` Output of "`kubectl config view`": ``` apiVersion: v1 clusters: [] contexts: [] current-context: "" kind: Config preferences: {} users: [] ``` Output of "`systemctl show kubelet`": ``` Type=simple Restart=on-failure NotifyAccess=none RestartUSec=100ms TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s WatchdogUSec=0 WatchdogTimestamp=Mon 2020-06-29 19:40:47 UTC WatchdogTimestampMonotonic=5426351159947 StartLimitInterval=10000000 StartLimitBurst=5 StartLimitAction=none FailureAction=none PermissionsStartOnly=no RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=347646 ControlPID=0 FileDescriptorStoreMax=0 StatusErrno=0 Result=success ExecMainStartTimestamp=Mon 2020-06-29 19:40:47 UTC ExecMainStartTimestampMonotonic=5426351159899 ExecMainExitTimestampMonotonic=0 ExecMainPID=347646 ExecMainCode=0 ExecMainStatus=0 ExecStart={ path=/usr/bin/kubelet ; argv[]=/usr/bin/kubelet $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBELET_ADDRESS $KUBELET_PORT $KUBELET_HOSTNAME $KUBE_ALLOW_PRIV $KUBELET_ARGS ; ignore_errors=no ; start_time=[Mon 2020-06-29 19:40:47 UTC] ; stop_time=[n/a] ; pid=347646 ; code=(null) ; status=0/0 } Slice=system.slice ControlGroup=/system.slice/kubelet.service MemoryCurrent=56438784 TasksCurrent=59 Delegate=no CPUAccounting=no CPUShares=18446744073709551615 StartupCPUShares=18446744073709551615 CPUQuotaPerSecUSec=infinity BlockIOAccounting=no BlockIOWeight=18446744073709551615 StartupBlockIOWeight=18446744073709551615 MemoryAccounting=no MemoryLimit=18446744073709551615 DevicePolicy=auto TasksAccounting=no TasksMax=18446744073709551615 EnvironmentFile=/etc/sysconfig/kubelet (ignore_errors=yes) UMask=0022 LimitCPU=18446744073709551615 LimitFSIZE=18446744073709551615 LimitDATA=18446744073709551615 LimitSTACK=18446744073709551615 LimitCORE=18446744073709551615 LimitRSS=18446744073709551615 LimitNOFILE=65536 LimitAS=18446744073709551615 LimitNPROC=767294 LimitMEMLOCK=65536 LimitLOCKS=18446744073709551615 LimitSIGPENDING=767294 LimitMSGQUEUE=819200 LimitNICE=0 LimitRTPRIO=0 LimitRTTIME=18446744073709551615 WorkingDirectory=/var/lib/kubelet OOMScoreAdjust=0 Nice=0 IOScheduling=0 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SecureBits=0 CapabilityBoundingSet=18446744073709551615 AmbientCapabilities=0 MountFlags=0 PrivateTmp=no PrivateNetwork=no PrivateDevices=no ProtectHome=no ProtectSystem=no SameProcessGroup=no IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=0 RuntimeDirectoryMode=0755 KillMode=control-group KillSignal=15 SendSIGKILL=yes SendSIGHUP=no Id=kubelet.service Names=kubelet.service Requires=kube-proxy.service basic.target -.mount system.slice containerd.service WantedBy=multi-user.target Conflicts=shutdown.target Before=shutdown.target multi-user.target After=systemd-journald.socket basic.target -.mount system.slice containerd.service RequiresMountsFor=/var/lib/kubelet Documentation=https://github.com/kubernetes/kubernetes Description=Kubernetes Kubelet Server LoadState=loaded ActiveState=active SubState=running FragmentPath=/usr/lib/systemd/system/kubelet.service UnitFileState=enabled UnitFilePreset=disabled InactiveExitTimestamp=Mon 2020-06-29 19:40:47 UTC InactiveExitTimestampMonotonic=5426351159966 ActiveEnterTimestamp=Mon 2020-06-29 19:40:47 UTC ActiveEnterTimestampMonotonic=5426351159966 ActiveExitTimestamp=Mon 2020-06-29 19:40:47 UTC ActiveExitTimestampMonotonic=5426351081702 InactiveEnterTimestamp=Mon 2020-06-29 19:40:47 UTC InactiveEnterTimestampMonotonic=5426351090272 CanStart=yes CanStop=yes CanReload=no CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureJobMode=replace IgnoreOnIsolate=no IgnoreOnSnapshot=no NeedDaemonReload=no JobTimeoutUSec=0 JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Mon 2020-06-29 19:40:47 UTC ConditionTimestampMonotonic=5426351141054 AssertTimestamp=Mon 2020-06-29 19:40:47 UTC AssertTimestampMonotonic=5426351141054 Transient=no ``` No `crio` Have `containerd` ## containerd Output of "`containerd --version`": ``` containerd github.com/containerd/containerd v1.3.3 d76c121f76a5fc8a462dc64594aea72fe18e1178 ``` Output of "`systemctl show containerd`": ``` Type=simple Restart=always NotifyAccess=none RestartUSec=5s TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s WatchdogUSec=0 WatchdogTimestamp=Mon 2020-06-29 19:40:47 UTC WatchdogTimestampMonotonic=5426351139349 StartLimitInterval=10000000 StartLimitBurst=5 StartLimitAction=none FailureAction=none PermissionsStartOnly=no RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=347640 ControlPID=0 FileDescriptorStoreMax=0 StatusErrno=0 Result=success ExecMainStartTimestamp=Mon 2020-06-29 19:40:47 UTC ExecMainStartTimestampMonotonic=5426351139320 ExecMainExitTimestampMonotonic=0 ExecMainPID=347640 ExecMainCode=0 ExecMainStatus=0 ExecStartPre={ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; ignore_errors=no ; start_time=[Mon 2020-06-29 19:40:47 UTC] ; stop_time=[Mon 2020-06-29 19:40:47 UTC] ; pid=347638 ; code=exited ; status=0 } ExecStart={ path=/usr/local/bin/containerd ; argv[]=/usr/local/bin/containerd ; ignore_errors=no ; start_time=[Mon 2020-06-29 19:40:47 UTC] ; stop_time=[n/a] ; pid=347640 ; code=(null) ; status=0/0 } Slice=system.slice ControlGroup=/system.slice/containerd.service MemoryCurrent=58553622528 TasksCurrent=355 Delegate=yes CPUAccounting=no CPUShares=18446744073709551615 StartupCPUShares=18446744073709551615 CPUQuotaPerSecUSec=infinity BlockIOAccounting=no BlockIOWeight=18446744073709551615 StartupBlockIOWeight=18446744073709551615 MemoryAccounting=no MemoryLimit=18446744073709551615 DevicePolicy=auto TasksAccounting=no TasksMax=18446744073709551615 UMask=0022 LimitCPU=18446744073709551615 LimitFSIZE=18446744073709551615 LimitDATA=18446744073709551615 LimitSTACK=18446744073709551615 LimitCORE=18446744073709551615 LimitRSS=18446744073709551615 LimitNOFILE=1048576 LimitAS=18446744073709551615 LimitNPROC=18446744073709551615 LimitMEMLOCK=65536 LimitLOCKS=18446744073709551615 LimitSIGPENDING=767294 LimitMSGQUEUE=819200 LimitNICE=0 LimitRTPRIO=0 LimitRTTIME=18446744073709551615 OOMScoreAdjust=-999 Nice=0 IOScheduling=0 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SecureBits=0 CapabilityBoundingSet=18446744073709551615 AmbientCapabilities=0 MountFlags=0 PrivateTmp=no PrivateNetwork=no PrivateDevices=no ProtectHome=no ProtectSystem=no SameProcessGroup=no IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=0 RuntimeDirectoryMode=0755 KillMode=process KillSignal=15 SendSIGKILL=yes SendSIGHUP=no Id=containerd.service Names=containerd.service Requires=basic.target system.slice RequiredBy=kubelet.service WantedBy=multi-user.target Conflicts=shutdown.target Before=shutdown.target kubelet.service multi-user.target After=systemd-journald.socket basic.target system.slice network.target Documentation=https://containerd.io Description=containerd container runtime LoadState=loaded ActiveState=active SubState=running FragmentPath=/etc/systemd/system/containerd.service UnitFileState=enabled UnitFilePreset=disabled InactiveExitTimestamp=Mon 2020-06-29 19:40:47 UTC InactiveExitTimestampMonotonic=5426351126999 ActiveEnterTimestamp=Mon 2020-06-29 19:40:47 UTC ActiveEnterTimestampMonotonic=5426351139392 ActiveExitTimestamp=Mon 2020-06-29 19:40:47 UTC ActiveExitTimestampMonotonic=5426351111933 InactiveEnterTimestamp=Mon 2020-06-29 19:40:47 UTC InactiveEnterTimestampMonotonic=5426351125544 CanStart=yes CanStop=yes CanReload=no CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureJobMode=replace IgnoreOnIsolate=no IgnoreOnSnapshot=no NeedDaemonReload=no JobTimeoutUSec=0 JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Mon 2020-06-29 19:40:47 UTC ConditionTimestampMonotonic=5426351126565 AssertTimestamp=Mon 2020-06-29 19:40:47 UTC AssertTimestampMonotonic=5426351126565 Transient=no ``` Output of "`cat /etc/containerd/config.toml`": ``` # Originally version 1 was referred from https://github.com/kata-containers/documentation/blob/master/how-to/containerd-kata.md#configure-containerd-to-use-kata-containers # Used containerd config dump to generate version 2 config # Details of this config is in https://github.com/containerd/cri/blob/master/docs/config.md version = 2 root = "/var/lib/containerd" state = "/run/containerd" plugin_dir = "" disabled_plugins = [] required_plugins = [] # The service unit file already have -999, keep it here for redundant oom_score = -999 [grpc] address = "/run/containerd/containerd.sock" tcp_address = "" tcp_tls_cert = "" tcp_tls_key = "" uid = 0 gid = 0 max_recv_message_size = 16777216 max_send_message_size = 16777216 [ttrpc] address = "" uid = 0 gid = 0 [debug] address = "" uid = 0 gid = 0 level = "info" [metrics] address = "0.0.0.0:1338" grpc_histogram = false [cgroup] path = "" [timeouts] "io.containerd.timeout.shim.cleanup" = "5s" "io.containerd.timeout.shim.load" = "5s" "io.containerd.timeout.shim.shutdown" = "3s" "io.containerd.timeout.task.state" = "2s" [plugins] [plugins."io.containerd.gc.v1.scheduler"] pause_threshold = 0.02 deletion_threshold = 0 mutation_threshold = 100 schedule_delay = "0s" startup_delay = "100ms" [plugins."io.containerd.grpc.v1.cri"] disable_tcp_service = true stream_server_address = "127.0.0.1" stream_server_port = "0" stream_idle_timeout = "4h0m0s" enable_selinux = false sandbox_image = "docker.ouroath.com:4443/yahoo-cloud/k8s.gcr.io/pause:3.1" stats_collect_period = 10 enable_tls_streaming = false max_container_log_line_size = 16384 disable_cgroup = false disable_apparmor = false restrict_oom_score_adj = false max_concurrent_downloads = 3 disable_proc_mount = false [plugins."io.containerd.grpc.v1.cri".containerd] snapshotter = "devmapper" default_runtime_name = "runc" no_pivot = false [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata] runtime_type = "io.containerd.kata.v2" privileged_without_host_devices = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options] ConfigPath = "/etc/kata-containers/configuration.toml" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] runtime_type = "io.containerd.runc.v2" privileged_without_host_devices = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] NoPivotRoot = false NoNewKeyring = false ShimCgroup = "" IoUid = 0 IoGid = 0 BinaryName = "" Root = "" CriuPath = "" SystemdCgroup = true CriuImagePath = "" CriuWorkPath = "" [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/opt/cni/bin" conf_dir = "/etc/cni/net.d" max_conf_num = 1 conf_template = "" [plugins."io.containerd.grpc.v1.cri".registry] [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://registry-1.docker.io"] [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming] tls_cert_file = "" tls_key_file = "" [plugins."io.containerd.internal.v1.opt"] path = "/opt/containerd" [plugins."io.containerd.internal.v1.restart"] interval = "10s" [plugins."io.containerd.metadata.v1.bolt"] content_sharing_policy = "shared" [plugins."io.containerd.monitor.v1.cgroups"] no_prometheus = false [plugins."io.containerd.runtime.v1.linux"] shim = "containerd-shim" runtime = "runc" runtime_root = "" no_shim = false shim_debug = false systemd_cgroup = true [plugins."io.containerd.runtime.v2.task"] platforms = ["linux/amd64"] [plugins."io.containerd.service.v1.diff-service"] default = ["walking"] [plugins."io.containerd.snapshotter.v1.devmapper"] root_path = "" pool_name = "sys-docker--pool" base_image_size = "100GB" ``` --- # Packages No `dpkg` Have `rpm` Output of "`rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"`": ``` qemu-vanilla-data-4.1.1+git.99c5874a9b-5.1.x86_64 qemu-vanilla-bin-4.1.1+git.99c5874a9b-5.1.x86_64 kata-proxy-1.11.1-5.1.x86_64 kata-linux-container-debug-5.4.32.73-5.1.x86_64 kata-shim-bin-1.11.1-5.1.x86_64 qemu-vanilla-4.1.1+git.99c5874a9b-5.1.x86_64 kata-proxy-bin-1.11.1-5.1.x86_64 kata-containers-image-1.11.1-5.1.x86_64 kata-runtime-1.11.1-5.1.x86_64 kata-shim-1.11.1-5.1.x86_64 kata-linux-container-5.4.32.73-5.1.x86_64 kata-ksm-throttler-1.11.1-5.1.x86_64 ``` ---

devimc commented 4 years ago

are you running kata-shimv2 ?

parthasl commented 4 years ago

yes

devimc commented 4 years ago

uhmm ok, that explains why I couldn't reproduce it with the CLI @lifupan any thoughts ?

lifupan commented 4 years ago

uhmm ok, that explains why I couldn't reproduce it with the CLI @lifupan any thoughts ?

It seemed I couldn't reproduce it. I had tried to launch a pod with 4 cpus and then run stress with "--cpu 4", all of the cpus are fulled used as below: htop

devimc commented 4 years ago

@lifupan thanks. @parthasl could you elaborate on this?

lifupan commented 4 years ago

@parthasl

Have you set cpuset to your stress program? it seemed that all of the stress thread running on a single cpu.

parthasl commented 4 years ago

@devimc

Are you using pod overhead? No

k8s version:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"clean", BuildDate:"2020-01-18T23:33:14Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"archive", BuildDate:"2020-01-21T20:41:54Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

k8s runtime class:

apiVersion: node.k8s.io/v1beta1
kind: RuntimeClass
metadata:
  name: kata
handler: kata

k8s yaml:

apiVersion: v1
kind: Pod
metadata:
  name: testcpu
  namespace: kata
spec:
  runtimeClassName: kata
  serviceAccount: sd
  terminationGracePeriodSeconds: 30
  containers:
  - name: "testcpu"
    image: centos:7
    imagePullPolicy: IfNotPresent
    command: ['sh', '-c', 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done']
    resources:
      requests:
        memory: "2Gi"
        cpu: "4"
      limits:
        memory: "2Gi"
        cpu: "4"
    securityContext:
      privileged: true
    volumeMounts:
    - mountPath: /workspace
      name: workspace
  volumes:
    - name: workspace
      emptyDir: {}
  restartPolicy: Always
  nodeSelector:
    kubernetes.io/hostname: node
  tolerations:
  - key: dedicated
    operator: Equal
    value: sd-kata
    effect: NoSchedule

@lifupan No, used this stress --cpu 4 --timeout 60s. In your test default_vcpus is set to 1?

parthasl commented 4 years ago

@devimc @lifupan inside the container when I set the cpuset.cpus 0-3 /sys/fs/cgroup/cpuset/cpuset.cpus, it uses CPUs properly. Earlier the value was 0 and it was using 1 CPU. May I know, how do I set it from k8s pod?

image

lifupan commented 4 years ago

@parthasl

It's weird, the cpuset.cpus shouldn't be 0, and it should be set 0-3 by default.

lifupan commented 4 years ago

@lifupan No, used this stress --cpu 4 --timeout 60s. In your test default_vcpus is set to 1?

Yes, default_vcpus is set to 1 and my pod yaml as below:

apiVersion: v1
kind: Pod
metadata:
  name: fupan-kata
spec:
  runtimeClassName: kata
  containers:
  - name: fupancount
    image: ubuntu
    resources:
        limits:
            cpu: "4"
        requests:
            cpu: "4"
parthasl commented 4 years ago

Thanks @lifupan, am I missing any configuration? https://github.com/kata-containers/runtime/issues/2809#issuecomment-651327965