kata-containers / kata-containers

Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers, but provide the workload isolation and security advantages of VMs. https://katacontainers.io/
Apache License 2.0
5.37k stars 1.04k forks source link

vhost-fs.sock: Connection refused #7107

Open anhtuanle00 opened 1 year ago

anhtuanle00 commented 1 year ago

RHEL 8 + RKE2 with containerd and Kata-containers: can't seem to get the test qemu pod running.....

level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=2788236 sandbox=050ca4232f21457d6053f9db27df929e95ff5e101fb2157cbd2295ddf81654dd source=cgroups

virtiofsd[2807080]: Error entering sandbox: Fork(Os { code: 38, kind: Unsupported, message: "Function not implemented" })

level=error msg="qemu-system-x86_64: -chardev socket,id=char-7173591710037d13,path=/run/vc/vm/050ca4232f21457d6053f9db27df929e95ff5e101fb2157cbd2295ddf81654dd/vhost-fs.sock: Failed to connect to '/run/vc/vm/050ca4232f21457d6053f9db27df929e95ff5e101fb2157cbd2295ddf81654dd/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=2788236 qemuPid=2788249 sandbox=050ca4232f21457d6053f9db27df929e95ff5e101fb2157cbd2295ddf81654dd source=virtcontainers/hypervisor subsystem=qemu

Not sure what config files need to be shown, basically using stock RKE2 and Kata configs....., have RHEL8 virt module installed

danishprakash commented 11 months ago

Facing this issue while running kata-containers with containerd. Here's the kata-collect-data.sh dump if it helps.

Show kata-collect-data.sh details

# Meta details Running `kata-collect-data.sh` version `3.2.0-rc0 (commit 108db0a7210b392e8aec2781043dfbd8297f84b9)` at `2023-10-12.14:35:38.465625011+0200`. ---

Runtime

Runtime is `/usr/local/bin/kata-runtime`. # `kata-env`

/usr/local/bin/kata-runtime kata-env

```toml [Kernel] Path = "/usr/share/kata-containers/vmlinux-5.19.2-116-nvidia-gpu" Parameters = "systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none agent.log=debug initcall_debug" [Meta] Version = "1.0.26" [Image] Path = "/usr/share/kata-containers/kata-containers-2023-10-11-09:44:09.287428231+0200-108db0a72" [Initrd] Path = "" [Hypervisor] MachineType = "q35" Version = "QEMU emulator version 8.1.0 (openSUSE Tumbleweed)\nCopyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers" Path = "/usr/bin/qemu-system-x86_64" BlockDeviceDriver = "virtio-scsi" EntropySource = "/dev/urandom" SharedFS = "virtio-fs" VirtioFSDaemon = "/usr/libexec/virtiofsd" SocketPath = "" Msize9p = 8192 MemorySlots = 10 HotPlugVFIO = "no-port" ColdPlugVFIO = "no-port" Debug = true [Runtime] Path = "/usr/local/bin/kata-runtime" GuestSeLinuxLabel = "" Debug = false Trace = false DisableGuestSeccomp = true DisableNewNetNs = false SandboxCgroupOnly = false [Runtime.Config] Path = "/etc/kata-containers/configuration.toml" [Runtime.Version] OCI = "1.1.0-rc.1" [Runtime.Version.Version] Semver = "3.2.0-rc0" Commit = "108db0a7210b392e8aec2781043dfbd8297f84b9" Major = 3 Minor = 2 Patch = 0 [Host] AvailableGuestProtections = ["snp"] Kernel = "6.1.0-rc4-snp-v8-svsm-host+" Architecture = "amd64" VMContainerCapable = true SupportVSocks = true [Host.Distro] Name = "openSUSE Tumbleweed" Version = "20230215" [Host.CPU] Vendor = "AuthenticAMD" Model = "AMD EPYC 7713 64-Core Processor" CPUs = 256 [Host.Memory] Total = 262812068 Free = 174409020 Available = 253017328 [Agent] Debug = false Trace = false ```

---

Runtime config files

# Runtime config files ## Runtime default config files ``` /etc/kata-containers/configuration.toml /usr/share/defaults/kata-containers/configuration.toml ``` ## Runtime config file contents

cat "/etc/kata-containers/configuration.toml"

```toml # Copyright (c) 2017-2019 Intel Corporation # Copyright (c) 2021 Adobe Inc. # # SPDX-License-Identifier: Apache-2.0 # # XXX: WARNING: this file is auto-generated. # XXX: # XXX: Source file: "config/configuration-qemu.toml.in" # XXX: Project: # XXX: Name: Kata Containers # XXX: Type: kata [hypervisor.qemu] path = "/usr/bin/qemu-system-x86_64" kernel = "/usr/share/kata-containers/vmlinux-nvidia-gpu.container" image = "/usr/share/kata-containers/kata-containers.img" # initrd = "/usr/share/kata-containers/kata-containers-initrd.img" machine_type = "q35" # rootfs filesystem type: # - ext4 (default) # - xfs # - erofs rootfs_type="ext4" # Enable confidential guest support. # Toggling that setting may trigger different hardware features, ranging # from memory encryption to both memory and CPU-state encryption and integrity. # The Kata Containers runtime dynamically detects the available feature set and # aims at enabling the largest possible one, returning an error if none is # available, or none is supported by the hypervisor. # # Known limitations: # * Does not work by design: # - CPU Hotplug # - Memory Hotplug # - NVDIMM devices # # Default false # confidential_guest = true # Choose AMD SEV-SNP confidential guests # In case of using confidential guests on AMD hardware that supports both SEV # and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default. # Default false # sev_snp_guest = true # Enable running QEMU VMM as a non-root user. # By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as # a non-root random user. See documentation for the limitations of this mode. # rootless = true # List of valid annotation names for the hypervisor # Each member of the list is a regular expression, which is the base name # of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path" enable_annotations = ["enable_iommu", "virtio_fs_extra_args", "kernel_params"] # List of valid annotations values for the hypervisor # Each member of the list is a path pattern as described by glob(3). # The default if not set is empty (all annotations rejected.) # Your distribution recommends: ["/usr/bin/qemu-system-x86_64"] valid_hypervisor_paths = ["/usr/bin/qemu-system-x86_64"] # Optional space-separated list of options to pass to the guest kernel. # For example, use `kernel_params = "vsyscall=emulate"` if you are having # trouble running pre-2.15 glibc. # # WARNING: - any parameter specified here will take priority over the default # parameter value of the same name used to start the virtual machine. # Do not set values here unless you understand the impact of doing so as you # may stop the virtual machine from booting. # To see the list of default parameters, enable hypervisor debug, create a # container and look for 'default-kernel-parameters' log entries. kernel_params = " agent.log=debug initcall_debug" # Path to the firmware. # If you want that qemu uses the default firmware leave this option empty firmware = "" # Path to the firmware volume. # firmware TDVF or OVMF can be split into FIRMWARE_VARS.fd (UEFI variables # as configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI variables # can be customized per each user while UEFI code is kept same. firmware_volume = "" # Machine accelerators # comma-separated list of machine accelerators to pass to the hypervisor. # For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"` machine_accelerators="" # Qemu seccomp sandbox feature # comma-separated list of seccomp sandbox features to control the syscall access. # For example, `seccompsandbox= "on,obsolete=deny,spawn=deny,resourcecontrol=deny"` # Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox # Another note: enabling this feature may reduce performance, you may enable # /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html #seccompsandbox="on,obsolete=deny,spawn=deny,resourcecontrol=deny" # CPU features # comma-separated list of cpu features to pass to the cpu # For example, `cpu_features = "pmu=off,vmx=off" cpu_features="pmu=off" # Default number of vCPUs per SB/VM: # unspecified or 0 --> will be set to 1 # < 0 --> will be set to the actual number of physical cores # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores default_vcpus = 2 # Default maximum number of vCPUs per SB/VM: # unspecified or == 0 --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when # the actual number of physical cores is greater than it. # WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU # the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs # can be added to a SB/VM, but the memory footprint will be big. Another example, with # `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of # vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable, # unless you know what are you doing. # NOTICE: on arm platform with gicv2 interrupt controller, set it to 8. default_maxvcpus = 0 # Bridges can be used to hot plug devices. # Limitations: # * Currently only pci bridges are supported # * Until 30 devices per bridge can be hot plugged. # * Until 5 PCI bridges can be cold plugged per VM. # This limitation could be a bug in qemu or in the kernel # Default number of bridges per SB/VM: # unspecified or 0 --> will be set to 1 # > 1 <= 5 --> will be set to the specified number # > 5 --> will be set to 5 default_bridges = 1 # Default memory size in MiB for SB/VM. # If unspecified then it will be set 2048 MiB. default_memory = 2048 # # Default memory slots per SB/VM. # If unspecified then it will be set 10. # This is will determine the times that memory will be hotadded to sandbox/VM. #memory_slots = 10 # Default maximum memory in MiB per SB / VM # unspecified or == 0 --> will be set to the actual amount of physical RAM # > 0 <= amount of physical RAM --> will be set to the specified number # > amount of physical RAM --> will be set to the actual amount of physical RAM default_maxmemory = 0 # The size in MiB will be plused to max memory of hypervisor. # It is the memory address space for the NVDIMM devie. # If set block storage driver (block_device_driver) to "nvdimm", # should set memory_offset to the size of block device. # Default 0 #memory_offset = 0 # Specifies virtio-mem will be enabled or not. # Please note that this option should be used with the command # "echo 1 > /proc/sys/vm/overcommit_memory". # Default false #enable_virtio_mem = true # Disable block device from being used for a container's rootfs. # In case of a storage driver like devicemapper where a container's # root file system is backed by a block device, the block device is passed # directly to the hypervisor for performance reasons. # This flag prevents the block device from being passed to the hypervisor, # virtio-fs is used instead to pass the rootfs. disable_block_device_use = false # Shared file system type: # - virtio-fs (default) # - virtio-9p # - virtio-fs-nydus # - none shared_fs = "virtio-fs" # Path to vhost-user-fs daemon. virtio_fs_daemon = "/usr/libexec/virtiofsd" # List of valid annotations values for the virtiofs daemon # The default if not set is empty (all annotations rejected.) # Your distribution recommends: ["/usr/libexec/virtiofsd"] valid_virtio_fs_daemon_paths = ["/usr/libexec/virtiofsd"] # Default size of DAX cache in MiB virtio_fs_cache_size = 0 # Default size of virtqueues virtio_fs_queue_size = 1024 # Extra args for virtiofsd daemon # # Format example: # ["--arg1=xxx", "--arg2=yyy"] # Examples: # Set virtiofsd log level to debug : ["--log-level=debug"] # # see `virtiofsd -h` for possible options. virtio_fs_extra_args = ["--thread-pool-size=1", "--announce-submounts"] # Cache mode: # # - never # Metadata, data, and pathname lookup are not cached in guest. They are # always fetched from host and any changes are immediately pushed to host. # # - auto # Metadata and pathname lookup cache expires after a configured amount of # time (default is 1 second). Data is cached while the file is open (close # to open consistency). # # - always # Metadata, data, and pathname lookup are cached in guest and never expire. virtio_fs_cache = "auto" # Block storage driver to be used for the hypervisor in case the container # rootfs is backed by a block device. This is virtio-scsi, virtio-blk # or nvdimm. block_device_driver = "virtio-scsi" # aio is the I/O mechanism used by qemu # Options: # # - threads # Pthread based disk I/O. # # - native # Native Linux I/O. # # - io_uring # Linux io_uring API. This provides the fastest I/O operations on Linux, requires kernel>5.1 and # qemu >=5.0. block_device_aio = "io_uring" # Specifies cache-related options will be set to block devices or not. # Default false #block_device_cache_set = true # Specifies cache-related options for block devices. # Denotes whether use of O_DIRECT (bypass the host page cache) is enabled. # Default false #block_device_cache_direct = true # Specifies cache-related options for block devices. # Denotes whether flush requests for the device are ignored. # Default false #block_device_cache_noflush = true # Enable iothreads (data-plane) to be used. This causes IO to be # handled in a separate IO thread. This is currently only implemented # for SCSI. # enable_iothreads = false # Enable pre allocation of VM RAM, default false # Enabling this will result in lower container density # as all of the memory will be allocated and locked # This is useful when you want to reserve all the memory # upfront or in the cases where you want memory latencies # to be very predictable # Default false #enable_mem_prealloc = true # Enable huge pages for VM RAM, default false # Enabling this will result in the VM memory # being allocated using huge pages. # This is useful when you want to use vhost-user network # stacks within the container. This will automatically # result in memory pre allocation #enable_hugepages = true # Enable vhost-user storage device, default false # Enabling this will result in some Linux reserved block type # major range 240-254 being chosen to represent vhost-user devices. enable_vhost_user_store = false # The base directory specifically used for vhost-user devices. # Its sub-path "block" is used for block devices; "block/sockets" is # where we expect vhost-user sockets to live; "block/devices" is where # simulated block device nodes for vhost-user devices to live. vhost_user_store_path = "/var/run/kata-containers/vhost-user" # Enable vIOMMU, default false # Enabling this will result in the VM having a vIOMMU device # This will also add the following options to the kernel's # command line: intel_iommu=on,iommu=pt #enable_iommu = true # Enable IOMMU_PLATFORM, default false # Enabling this will result in the VM device having iommu_platform=on set #enable_iommu_platform = true # List of valid annotations values for the vhost user store path # The default if not set is empty (all annotations rejected.) # Your distribution recommends: ["/var/run/kata-containers/vhost-user"] valid_vhost_user_store_paths = ["/var/run/kata-containers/vhost-user"] # The timeout for reconnecting on non-server spdk sockets when the remote end goes away. # qemu will delay this many seconds and then attempt to reconnect. # Zero disables reconnecting, and the default is zero. vhost_user_reconnect_timeout_sec = 5 # Enable file based guest memory support. The default is an empty string which # will disable this feature. In the case of virtio-fs, this is enabled # automatically and '/dev/shm' is used as the backing folder. # This option will be ignored if VM templating is enabled. #file_mem_backend = "" # List of valid annotations values for the file_mem_backend annotation # The default if not set is empty (all annotations rejected.) # Your distribution recommends: [""] valid_file_mem_backends = [""] # -pflash can add image file to VM. The arguments of it should be in format # of ["/path/to/flash0.img", "/path/to/flash1.img"] pflashes = [] # This option changes the default hypervisor and kernel parameters # to enable debug output where available. # # Default false enable_debug = true # This option allows to add an extra HMP or QMP socket when `enable_debug = true` # # WARNING: Anyone with access to the extra socket can take full control of # Qemu. This is for debugging purpose only and must *NEVER* be used in # production. # # Valid values are : # - "hmp" # - "qmp" # - "qmp-pretty" (same as "qmp" with pretty json formatting) # # If set to the empty string "", no extra monitor socket is added. This is # the default. #extra_monitor_socket = hmp # Disable the customizations done in the runtime when it detects # that it is running on top a VMM. This will result in the runtime # behaving as it would when running on bare metal. # #disable_nesting_checks = true # This is the msize used for 9p shares. It is the number of bytes # used for 9p packet payload. #msize_9p = 8192 # If false and nvdimm is supported, use nvdimm device to plug guest image. # Otherwise virtio-block device is used. # # nvdimm is not supported when `confidential_guest = true`. # # Default is false #disable_image_nvdimm = true # VFIO devices are hotplugged on a bridge by default. # Enable hotplugging on root bus. This may be required for devices with # a large PCI bar, as this is a current limitation with hotplugging on # a bridge. # Default false #hotplug_vfio_on_root_bus = true # Enable hot-plugging of VFIO devices to a bridge-port, # root-port or switch-port. # The default setting is "no-port" #hot_plug_vfio = "root-port" # In a confidential compute environment hot-plugging can compromise # security. # Enable cold-plugging of VFIO devices to a bridge-port, # root-port or switch-port. # The default setting is "no-port", which means disabled. #cold_plug_vfio = "root-port" # Before hot plugging a PCIe device, you need to add a pcie_root_port device. # Use this parameter when using some large PCI bar devices, such as Nvidia GPU # The value means the number of pcie_root_port # This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35" # Default 0 #pcie_root_port = 2 # If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off # security (vhost-net runs ring0) for network I/O performance. #disable_vhost_net = true # # Default entropy source. # The path to a host source of entropy (including a real hardware RNG) # /dev/urandom and /dev/random are two main options. # Be aware that /dev/random is a blocking source of entropy. If the host # runs out of entropy, the VMs boot time will increase leading to get startup # timeouts. # The source of entropy /dev/urandom is non-blocking and provides a # generally acceptable source of entropy. It should work well for pretty much # all practical purposes. #entropy_source= "/dev/urandom" # List of valid annotations values for entropy_source # The default if not set is empty (all annotations rejected.) # Your distribution recommends: ["/dev/urandom","/dev/random",""] valid_entropy_sources = ["/dev/urandom","/dev/random",""] # Path to OCI hook binaries in the *guest rootfs*. # This does not affect host-side hooks which must instead be added to # the OCI spec passed to the runtime. # # You can create a rootfs with hooks by customizing the osbuilder scripts: # https://github.com/kata-containers/kata-containers/tree/main/tools/osbuilder # # Hooks must be stored in a subdirectory of guest_hook_path according to their # hook type, i.e. "guest_hook_path/{prestart,poststart,poststop}". # The agent will scan these directories for executable files and add them, in # lexicographical order, to the lifecycle of the guest container. # Hooks are executed in the runtime namespace of the guest. See the official documentation: # https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks # Warnings will be logged if any error is encountered while scanning for hooks, # but it will not abort container execution. #guest_hook_path = "/usr/share/oci/hooks" # # Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM). # In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic. # Default 0-sized value means unlimited rate. #rx_rate_limiter_max_rate = 0 # Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM). # In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block) # to discipline traffic. # Default 0-sized value means unlimited rate. #tx_rate_limiter_max_rate = 0 # Set where to save the guest memory dump file. # If set, when GUEST_PANICKED event occurred, # guest memeory will be dumped to host filesystem under guest_memory_dump_path, # This directory will be created automatically if it does not exist. # # The dumped file(also called vmcore) can be processed with crash or gdb. # # WARNING: # Dump guest’s memory can take very long depending on the amount of guest memory # and use much disk space. #guest_memory_dump_path="/var/crash/kata" # If enable paging. # Basically, if you want to use "gdb" rather than "crash", # or need the guest-virtual addresses in the ELF vmcore, # then you should enable paging. # # See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details #guest_memory_dump_paging=false # Enable swap in the guest. Default false. # When enable_guest_swap is enabled, insert a raw file to the guest as the swap device # if the swappiness of a container (set by annotation "io.katacontainers.container.resource.swappiness") # is bigger than 0. # The size of the swap device should be # swap_in_bytes (set by annotation "io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes. # If swap_in_bytes is not set, the size should be memory_limit_in_bytes. # If swap_in_bytes and memory_limit_in_bytes is not set, the size should # be default_memory. #enable_guest_swap = true # use legacy serial for guest console if available and implemented for architecture. Default false #use_legacy_serial = true # disable applying SELinux on the VMM process (default false) disable_selinux=false # disable applying SELinux on the container process # If set to false, the type `container_t` is applied to the container process by default. # Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built # with `SELINUX=yes`. # (default: true) disable_guest_selinux=true [factory] # VM templating support. Once enabled, new VMs are created from template # using vm cloning. They will share the same initial kernel, initramfs and # agent memory by mapping it readonly. It helps speeding up new container # creation and saves a lot of memory if there are many kata containers running # on the same host. # # When disabled, new VMs are created from scratch. # # Note: Requires "initrd=" to be set ("image=" is not supported). # # Default false #enable_template = true # Specifies the path of template. # # Default "/run/vc/vm/template" #template_path = "/run/vc/vm/template" # The number of caches of VMCache: # unspecified or == 0 --> VMCache is disabled # > 0 --> will be set to the specified number # # VMCache is a function that creates VMs as caches before using it. # It helps speed up new container creation. # The function consists of a server and some clients communicating # through Unix socket. The protocol is gRPC in protocols/cache/cache.proto. # The VMCache server will create some VMs and cache them by factory cache. # It will convert the VM to gRPC format and transport it when gets # requestion from clients. # Factory grpccache is the VMCache client. It will request gRPC format # VM and convert it back to a VM. If VMCache function is enabled, # kata-runtime will request VM from factory grpccache when it creates # a new sandbox. # # Default 0 #vm_cache_number = 0 # Specify the address of the Unix socket that is used by VMCache. # # Default /var/run/kata-containers/cache.sock #vm_cache_endpoint = "/var/run/kata-containers/cache.sock" [agent.kata] # If enabled, make the agent display debug-level messages. # (default: disabled) #enable_debug = true # Enable agent tracing. # # If enabled, the agent will generate OpenTelemetry trace spans. # # Notes: # # - If the runtime also has tracing enabled, the agent spans will be # associated with the appropriate runtime parent span. # - If enabled, the runtime will wait for the container to shutdown, # increasing the container shutdown time slightly. # # (default: disabled) #enable_tracing = true # Comma separated list of kernel modules and their parameters. # These modules will be loaded in the guest kernel using modprobe(8). # The following example can be used to load two kernel modules with parameters # - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"] # The first word is considered as the module name and the rest as its parameters. # Container will not be started when: # * A kernel module is specified and the modprobe command is not installed in the guest # or it fails loading the module. # * The module is not available in the guest or it doesn't met the guest kernel # requirements, like architecture and version. # kernel_modules=[] # Enable debug console. # If enabled, user can connect guest OS running inside hypervisor # through "kata-runtime exec " command #debug_console_enabled = true # Agent connection dialing timeout value in seconds # (default: 45) dial_timeout = 45 [runtime] # If enabled, the runtime will log additional debug messages to the # system log # (default: disabled) #enable_debug = true # # Internetworking model # Determines how the VM should be connected to the # the container network interface # Options: # # - macvtap # Used when the Container network interface can be bridged using # macvtap. # # - none # Used when customize network. Only creates a tap device. No veth pair. # # - tcfilter # Uses tc filter rules to redirect traffic from the network interface # provided by plugin to a tap interface connected to the VM. # internetworking_model="tcfilter" # disable guest seccomp # Determines whether container seccomp profiles are passed to the virtual # machine and applied by the kata agent. If set to true, seccomp is not applied # within the guest # (default: true) disable_guest_seccomp=true # vCPUs pinning settings # if enabled, each vCPU thread will be scheduled to a fixed CPU # qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet) # enable_vcpus_pinning = false # Apply a custom SELinux security policy to the container process inside the VM. # This is used when you want to apply a type other than the default `container_t`, # so general users should not uncomment and apply it. # (format: "user:role:type") # Note: You cannot specify MCS policy with the label because the sensitivity levels and # categories are determined automatically by high-level container runtimes such as containerd. #guest_selinux_label="system_u:system_r:container_t" # If enabled, the runtime will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # (default: disabled) #enable_tracing = true # Set the full url to the Jaeger HTTP Thrift collector. # The default if not set will be "http://localhost:14268/api/traces" #jaeger_endpoint = "" # Sets the username to be used if basic auth is required for Jaeger. #jaeger_user = "" # Sets the password to be used if basic auth is required for Jaeger. #jaeger_password = "" # If enabled, the runtime will not create a network namespace for shim and hypervisor processes. # This option may have some potential impacts to your host. It should only be used when you know what you're doing. # `disable_new_netns` conflicts with `internetworking_model=tcfilter` and `internetworking_model=macvtap`. It works only # with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge # (like OVS) directly. # (default: false) #disable_new_netns = true # if enabled, the runtime will add all the kata processes inside one dedicated cgroup. # The container cgroups in the host are not created, just one single cgroup per sandbox. # The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox. # The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation. # The sandbox cgroup is constrained if there is no container type annotation. # See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType sandbox_cgroup_only=false # If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In # this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful # when a hardware architecture or hypervisor solutions is utilized which does not support CPU and/or memory hotplug. # Compatibility for determining appropriate sandbox (VM) size: # - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O # does not yet support sandbox sizing annotations. # - When running single containers using a tool like ctr, container sizing information will be available. static_sandbox_resource_mgmt=false # If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path. # This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory. # If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts` # These will not be exposed to the container workloads, and are only provided for potential guest services. sandbox_bind_mounts=[] # VFIO Mode # Determines how VFIO devices should be be presented to the container. # Options: # # - vfio # Matches behaviour of OCI runtimes (e.g. runc) as much as # possible. VFIO devices will appear in the container as VFIO # character devices under /dev/vfio. The exact names may differ # from the host (they need to match the VM's IOMMU group numbers # rather than the host's) # # - guest-kernel # This is a Kata-specific behaviour that's useful in certain cases. # The VFIO device is managed by whatever driver in the VM kernel # claims it. This means it will appear as one or more device nodes # or network interfaces depending on the nature of the device. # Using this mode requires specially built workloads that know how # to locate the relevant device interfaces within the VM. # vfio_mode="guest-kernel" # If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will # be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest. disable_guest_empty_dir=false # Enabled experimental feature list, format: ["a", "b"]. # Experimental features are features not stable enough for production, # they may break compatibility, and are prepared for a big version bump. # Supported experimental features: # (default: []) experimental=[] # If enabled, user can run pprof tools with shim v2 process through kata-monitor. # (default: false) # enable_pprof = true # WARNING: All the options in the following section have not been implemented yet. # This section was added as a placeholder. DO NOT USE IT! [image] # Container image service. # # Offload the CRI image management service to the Kata agent. # (default: false) #service_offload = true # Container image decryption keys provisioning. # Applies only if service_offload is true. # Keys can be provisioned locally (e.g. through a special command or # a local file) or remotely (usually after the guest is remotely attested). # The provision setting is a complete URL that lets the Kata agent decide # which method to use in order to fetch the keys. # # Keys can be stored in a local file, in a measured and attested initrd: #provision=data:///local/key/file # # Keys could be fetched through a special command or binary from the # initrd (guest) image, e.g. a firmware call: #provision=file:///path/to/bin/fetcher/in/guest # # Keys can be remotely provisioned. The Kata agent fetches them from e.g. # a HTTPS URL: #provision=https://my-key-broker.foo/tenant/ ```

cat "/usr/share/defaults/kata-containers/configuration.toml"

```toml # Copyright (c) 2017-2019 Intel Corporation # Copyright (c) 2021 Adobe Inc. # # SPDX-License-Identifier: Apache-2.0 # # XXX: WARNING: this file is auto-generated. # XXX: # XXX: Source file: "config/configuration-qemu.toml.in" # XXX: Project: # XXX: Name: Kata Containers # XXX: Type: kata [hypervisor.qemu] path = "/usr/bin/qemu-system-x86_64" kernel = "/usr/share/kata-containers/vmlinux.container" image = "/usr/share/kata-containers/kata-containers.img" # initrd = "/usr/share/kata-containers/kata-containers-initrd.img" machine_type = "q35" # rootfs filesystem type: # - ext4 (default) # - xfs # - erofs rootfs_type="ext4" # Enable confidential guest support. # Toggling that setting may trigger different hardware features, ranging # from memory encryption to both memory and CPU-state encryption and integrity. # The Kata Containers runtime dynamically detects the available feature set and # aims at enabling the largest possible one, returning an error if none is # available, or none is supported by the hypervisor. # # Known limitations: # * Does not work by design: # - CPU Hotplug # - Memory Hotplug # - NVDIMM devices # # Default false # confidential_guest = true # Choose AMD SEV-SNP confidential guests # In case of using confidential guests on AMD hardware that supports both SEV # and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default. # Default false # sev_snp_guest = true # Enable running QEMU VMM as a non-root user. # By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as # a non-root random user. See documentation for the limitations of this mode. # rootless = true # List of valid annotation names for the hypervisor # Each member of the list is a regular expression, which is the base name # of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path" enable_annotations = ["enable_iommu", "virtio_fs_extra_args", "kernel_params"] # List of valid annotations values for the hypervisor # Each member of the list is a path pattern as described by glob(3). # The default if not set is empty (all annotations rejected.) # Your distribution recommends: ["/usr/bin/qemu-system-x86_64"] valid_hypervisor_paths = ["/usr/bin/qemu-system-x86_64"] # Optional space-separated list of options to pass to the guest kernel. # For example, use `kernel_params = "vsyscall=emulate"` if you are having # trouble running pre-2.15 glibc. # # WARNING: - any parameter specified here will take priority over the default # parameter value of the same name used to start the virtual machine. # Do not set values here unless you understand the impact of doing so as you # may stop the virtual machine from booting. # To see the list of default parameters, enable hypervisor debug, create a # container and look for 'default-kernel-parameters' log entries. kernel_params = "" # Path to the firmware. # If you want that qemu uses the default firmware leave this option empty firmware = "" # Path to the firmware volume. # firmware TDVF or OVMF can be split into FIRMWARE_VARS.fd (UEFI variables # as configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI variables # can be customized per each user while UEFI code is kept same. firmware_volume = "" # Machine accelerators # comma-separated list of machine accelerators to pass to the hypervisor. # For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"` machine_accelerators="" # Qemu seccomp sandbox feature # comma-separated list of seccomp sandbox features to control the syscall access. # For example, `seccompsandbox= "on,obsolete=deny,spawn=deny,resourcecontrol=deny"` # Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox # Another note: enabling this feature may reduce performance, you may enable # /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html #seccompsandbox="on,obsolete=deny,spawn=deny,resourcecontrol=deny" # CPU features # comma-separated list of cpu features to pass to the cpu # For example, `cpu_features = "pmu=off,vmx=off" cpu_features="pmu=off" # Default number of vCPUs per SB/VM: # unspecified or 0 --> will be set to 1 # < 0 --> will be set to the actual number of physical cores # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores default_vcpus = 1 # Default maximum number of vCPUs per SB/VM: # unspecified or == 0 --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when # the actual number of physical cores is greater than it. # WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU # the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs # can be added to a SB/VM, but the memory footprint will be big. Another example, with # `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of # vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable, # unless you know what are you doing. # NOTICE: on arm platform with gicv2 interrupt controller, set it to 8. default_maxvcpus = 0 # Bridges can be used to hot plug devices. # Limitations: # * Currently only pci bridges are supported # * Until 30 devices per bridge can be hot plugged. # * Until 5 PCI bridges can be cold plugged per VM. # This limitation could be a bug in qemu or in the kernel # Default number of bridges per SB/VM: # unspecified or 0 --> will be set to 1 # > 1 <= 5 --> will be set to the specified number # > 5 --> will be set to 5 default_bridges = 1 # Default memory size in MiB for SB/VM. # If unspecified then it will be set 2048 MiB. default_memory = 2048 # # Default memory slots per SB/VM. # If unspecified then it will be set 10. # This is will determine the times that memory will be hotadded to sandbox/VM. #memory_slots = 10 # Default maximum memory in MiB per SB / VM # unspecified or == 0 --> will be set to the actual amount of physical RAM # > 0 <= amount of physical RAM --> will be set to the specified number # > amount of physical RAM --> will be set to the actual amount of physical RAM default_maxmemory = 0 # The size in MiB will be plused to max memory of hypervisor. # It is the memory address space for the NVDIMM devie. # If set block storage driver (block_device_driver) to "nvdimm", # should set memory_offset to the size of block device. # Default 0 #memory_offset = 0 # Specifies virtio-mem will be enabled or not. # Please note that this option should be used with the command # "echo 1 > /proc/sys/vm/overcommit_memory". # Default false #enable_virtio_mem = true # Disable block device from being used for a container's rootfs. # In case of a storage driver like devicemapper where a container's # root file system is backed by a block device, the block device is passed # directly to the hypervisor for performance reasons. # This flag prevents the block device from being passed to the hypervisor, # virtio-fs is used instead to pass the rootfs. disable_block_device_use = false # Shared file system type: # - virtio-fs (default) # - virtio-9p # - virtio-fs-nydus # - none shared_fs = "virtio-fs" # Path to vhost-user-fs daemon. virtio_fs_daemon = "/usr/libexec/virtiofsd" # List of valid annotations values for the virtiofs daemon # The default if not set is empty (all annotations rejected.) # Your distribution recommends: ["/usr/libexec/virtiofsd"] valid_virtio_fs_daemon_paths = ["/usr/libexec/virtiofsd"] # Default size of DAX cache in MiB virtio_fs_cache_size = 0 # Default size of virtqueues virtio_fs_queue_size = 1024 # Extra args for virtiofsd daemon # # Format example: # ["--arg1=xxx", "--arg2=yyy"] # Examples: # Set virtiofsd log level to debug : ["--log-level=debug"] # # see `virtiofsd -h` for possible options. virtio_fs_extra_args = ["--thread-pool-size=1", "--announce-submounts"] # Cache mode: # # - never # Metadata, data, and pathname lookup are not cached in guest. They are # always fetched from host and any changes are immediately pushed to host. # # - auto # Metadata and pathname lookup cache expires after a configured amount of # time (default is 1 second). Data is cached while the file is open (close # to open consistency). # # - always # Metadata, data, and pathname lookup are cached in guest and never expire. virtio_fs_cache = "auto" # Block storage driver to be used for the hypervisor in case the container # rootfs is backed by a block device. This is virtio-scsi, virtio-blk # or nvdimm. block_device_driver = "virtio-scsi" # aio is the I/O mechanism used by qemu # Options: # # - threads # Pthread based disk I/O. # # - native # Native Linux I/O. # # - io_uring # Linux io_uring API. This provides the fastest I/O operations on Linux, requires kernel>5.1 and # qemu >=5.0. block_device_aio = "io_uring" # Specifies cache-related options will be set to block devices or not. # Default false #block_device_cache_set = true # Specifies cache-related options for block devices. # Denotes whether use of O_DIRECT (bypass the host page cache) is enabled. # Default false #block_device_cache_direct = true # Specifies cache-related options for block devices. # Denotes whether flush requests for the device are ignored. # Default false #block_device_cache_noflush = true # Enable iothreads (data-plane) to be used. This causes IO to be # handled in a separate IO thread. This is currently only implemented # for SCSI. # enable_iothreads = false # Enable pre allocation of VM RAM, default false # Enabling this will result in lower container density # as all of the memory will be allocated and locked # This is useful when you want to reserve all the memory # upfront or in the cases where you want memory latencies # to be very predictable # Default false #enable_mem_prealloc = true # Enable huge pages for VM RAM, default false # Enabling this will result in the VM memory # being allocated using huge pages. # This is useful when you want to use vhost-user network # stacks within the container. This will automatically # result in memory pre allocation #enable_hugepages = true # Enable vhost-user storage device, default false # Enabling this will result in some Linux reserved block type # major range 240-254 being chosen to represent vhost-user devices. enable_vhost_user_store = false # The base directory specifically used for vhost-user devices. # Its sub-path "block" is used for block devices; "block/sockets" is # where we expect vhost-user sockets to live; "block/devices" is where # simulated block device nodes for vhost-user devices to live. vhost_user_store_path = "/var/run/kata-containers/vhost-user" # Enable vIOMMU, default false # Enabling this will result in the VM having a vIOMMU device # This will also add the following options to the kernel's # command line: intel_iommu=on,iommu=pt #enable_iommu = true # Enable IOMMU_PLATFORM, default false # Enabling this will result in the VM device having iommu_platform=on set #enable_iommu_platform = true # List of valid annotations values for the vhost user store path # The default if not set is empty (all annotations rejected.) # Your distribution recommends: ["/var/run/kata-containers/vhost-user"] valid_vhost_user_store_paths = ["/var/run/kata-containers/vhost-user"] # The timeout for reconnecting on non-server spdk sockets when the remote end goes away. # qemu will delay this many seconds and then attempt to reconnect. # Zero disables reconnecting, and the default is zero. vhost_user_reconnect_timeout_sec = 0 # Enable file based guest memory support. The default is an empty string which # will disable this feature. In the case of virtio-fs, this is enabled # automatically and '/dev/shm' is used as the backing folder. # This option will be ignored if VM templating is enabled. #file_mem_backend = "" # List of valid annotations values for the file_mem_backend annotation # The default if not set is empty (all annotations rejected.) # Your distribution recommends: [""] valid_file_mem_backends = [""] # -pflash can add image file to VM. The arguments of it should be in format # of ["/path/to/flash0.img", "/path/to/flash1.img"] pflashes = [] # This option changes the default hypervisor and kernel parameters # to enable debug output where available. # # Default false #enable_debug = true # This option allows to add an extra HMP or QMP socket when `enable_debug = true` # # WARNING: Anyone with access to the extra socket can take full control of # Qemu. This is for debugging purpose only and must *NEVER* be used in # production. # # Valid values are : # - "hmp" # - "qmp" # - "qmp-pretty" (same as "qmp" with pretty json formatting) # # If set to the empty string "", no extra monitor socket is added. This is # the default. #extra_monitor_socket = hmp # Disable the customizations done in the runtime when it detects # that it is running on top a VMM. This will result in the runtime # behaving as it would when running on bare metal. # #disable_nesting_checks = true # This is the msize used for 9p shares. It is the number of bytes # used for 9p packet payload. #msize_9p = 8192 # If false and nvdimm is supported, use nvdimm device to plug guest image. # Otherwise virtio-block device is used. # # nvdimm is not supported when `confidential_guest = true`. # # Default is false #disable_image_nvdimm = true # VFIO devices are hotplugged on a bridge by default. # Enable hotplugging on root bus. This may be required for devices with # a large PCI bar, as this is a current limitation with hotplugging on # a bridge. # Default false #hotplug_vfio_on_root_bus = true # Enable hot-plugging of VFIO devices to a bridge-port, # root-port or switch-port. # The default setting is "no-port" #hot_plug_vfio = "root-port" # In a confidential compute environment hot-plugging can compromise # security. # Enable cold-plugging of VFIO devices to a bridge-port, # root-port or switch-port. # The default setting is "no-port", which means disabled. #cold_plug_vfio = "root-port" # Before hot plugging a PCIe device, you need to add a pcie_root_port device. # Use this parameter when using some large PCI bar devices, such as Nvidia GPU # The value means the number of pcie_root_port # This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35" # Default 0 #pcie_root_port = 2 # If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off # security (vhost-net runs ring0) for network I/O performance. #disable_vhost_net = true # # Default entropy source. # The path to a host source of entropy (including a real hardware RNG) # /dev/urandom and /dev/random are two main options. # Be aware that /dev/random is a blocking source of entropy. If the host # runs out of entropy, the VMs boot time will increase leading to get startup # timeouts. # The source of entropy /dev/urandom is non-blocking and provides a # generally acceptable source of entropy. It should work well for pretty much # all practical purposes. #entropy_source= "/dev/urandom" # List of valid annotations values for entropy_source # The default if not set is empty (all annotations rejected.) # Your distribution recommends: ["/dev/urandom","/dev/random",""] valid_entropy_sources = ["/dev/urandom","/dev/random",""] # Path to OCI hook binaries in the *guest rootfs*. # This does not affect host-side hooks which must instead be added to # the OCI spec passed to the runtime. # # You can create a rootfs with hooks by customizing the osbuilder scripts: # https://github.com/kata-containers/kata-containers/tree/main/tools/osbuilder # # Hooks must be stored in a subdirectory of guest_hook_path according to their # hook type, i.e. "guest_hook_path/{prestart,poststart,poststop}". # The agent will scan these directories for executable files and add them, in # lexicographical order, to the lifecycle of the guest container. # Hooks are executed in the runtime namespace of the guest. See the official documentation: # https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks # Warnings will be logged if any error is encountered while scanning for hooks, # but it will not abort container execution. #guest_hook_path = "/usr/share/oci/hooks" # # Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM). # In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic. # Default 0-sized value means unlimited rate. #rx_rate_limiter_max_rate = 0 # Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM). # In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block) # to discipline traffic. # Default 0-sized value means unlimited rate. #tx_rate_limiter_max_rate = 0 # Set where to save the guest memory dump file. # If set, when GUEST_PANICKED event occurred, # guest memeory will be dumped to host filesystem under guest_memory_dump_path, # This directory will be created automatically if it does not exist. # # The dumped file(also called vmcore) can be processed with crash or gdb. # # WARNING: # Dump guest’s memory can take very long depending on the amount of guest memory # and use much disk space. #guest_memory_dump_path="/var/crash/kata" # If enable paging. # Basically, if you want to use "gdb" rather than "crash", # or need the guest-virtual addresses in the ELF vmcore, # then you should enable paging. # # See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details #guest_memory_dump_paging=false # Enable swap in the guest. Default false. # When enable_guest_swap is enabled, insert a raw file to the guest as the swap device # if the swappiness of a container (set by annotation "io.katacontainers.container.resource.swappiness") # is bigger than 0. # The size of the swap device should be # swap_in_bytes (set by annotation "io.katacontainers.container.resource.swap_in_bytes") - memory_limit_in_bytes. # If swap_in_bytes is not set, the size should be memory_limit_in_bytes. # If swap_in_bytes and memory_limit_in_bytes is not set, the size should # be default_memory. #enable_guest_swap = true # use legacy serial for guest console if available and implemented for architecture. Default false #use_legacy_serial = true # disable applying SELinux on the VMM process (default false) disable_selinux=false # disable applying SELinux on the container process # If set to false, the type `container_t` is applied to the container process by default. # Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built # with `SELINUX=yes`. # (default: true) disable_guest_selinux=true [factory] # VM templating support. Once enabled, new VMs are created from template # using vm cloning. They will share the same initial kernel, initramfs and # agent memory by mapping it readonly. It helps speeding up new container # creation and saves a lot of memory if there are many kata containers running # on the same host. # # When disabled, new VMs are created from scratch. # # Note: Requires "initrd=" to be set ("image=" is not supported). # # Default false #enable_template = true # Specifies the path of template. # # Default "/run/vc/vm/template" #template_path = "/run/vc/vm/template" # The number of caches of VMCache: # unspecified or == 0 --> VMCache is disabled # > 0 --> will be set to the specified number # # VMCache is a function that creates VMs as caches before using it. # It helps speed up new container creation. # The function consists of a server and some clients communicating # through Unix socket. The protocol is gRPC in protocols/cache/cache.proto. # The VMCache server will create some VMs and cache them by factory cache. # It will convert the VM to gRPC format and transport it when gets # requestion from clients. # Factory grpccache is the VMCache client. It will request gRPC format # VM and convert it back to a VM. If VMCache function is enabled, # kata-runtime will request VM from factory grpccache when it creates # a new sandbox. # # Default 0 #vm_cache_number = 0 # Specify the address of the Unix socket that is used by VMCache. # # Default /var/run/kata-containers/cache.sock #vm_cache_endpoint = "/var/run/kata-containers/cache.sock" [agent.kata] # If enabled, make the agent display debug-level messages. # (default: disabled) #enable_debug = true # Enable agent tracing. # # If enabled, the agent will generate OpenTelemetry trace spans. # # Notes: # # - If the runtime also has tracing enabled, the agent spans will be # associated with the appropriate runtime parent span. # - If enabled, the runtime will wait for the container to shutdown, # increasing the container shutdown time slightly. # # (default: disabled) #enable_tracing = true # Comma separated list of kernel modules and their parameters. # These modules will be loaded in the guest kernel using modprobe(8). # The following example can be used to load two kernel modules with parameters # - kernel_modules=["e1000e InterruptThrottleRate=3000,3000,3000 EEE=1", "i915 enable_ppgtt=0"] # The first word is considered as the module name and the rest as its parameters. # Container will not be started when: # * A kernel module is specified and the modprobe command is not installed in the guest # or it fails loading the module. # * The module is not available in the guest or it doesn't met the guest kernel # requirements, like architecture and version. # kernel_modules=[] # Enable debug console. # If enabled, user can connect guest OS running inside hypervisor # through "kata-runtime exec " command #debug_console_enabled = true # Agent connection dialing timeout value in seconds # (default: 45) dial_timeout = 45 [runtime] # If enabled, the runtime will log additional debug messages to the # system log # (default: disabled) #enable_debug = true # # Internetworking model # Determines how the VM should be connected to the # the container network interface # Options: # # - macvtap # Used when the Container network interface can be bridged using # macvtap. # # - none # Used when customize network. Only creates a tap device. No veth pair. # # - tcfilter # Uses tc filter rules to redirect traffic from the network interface # provided by plugin to a tap interface connected to the VM. # internetworking_model="tcfilter" # disable guest seccomp # Determines whether container seccomp profiles are passed to the virtual # machine and applied by the kata agent. If set to true, seccomp is not applied # within the guest # (default: true) disable_guest_seccomp=true # vCPUs pinning settings # if enabled, each vCPU thread will be scheduled to a fixed CPU # qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet) # enable_vcpus_pinning = false # Apply a custom SELinux security policy to the container process inside the VM. # This is used when you want to apply a type other than the default `container_t`, # so general users should not uncomment and apply it. # (format: "user:role:type") # Note: You cannot specify MCS policy with the label because the sensitivity levels and # categories are determined automatically by high-level container runtimes such as containerd. #guest_selinux_label="system_u:system_r:container_t" # If enabled, the runtime will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # (default: disabled) #enable_tracing = true # Set the full url to the Jaeger HTTP Thrift collector. # The default if not set will be "http://localhost:14268/api/traces" #jaeger_endpoint = "" # Sets the username to be used if basic auth is required for Jaeger. #jaeger_user = "" # Sets the password to be used if basic auth is required for Jaeger. #jaeger_password = "" # If enabled, the runtime will not create a network namespace for shim and hypervisor processes. # This option may have some potential impacts to your host. It should only be used when you know what you're doing. # `disable_new_netns` conflicts with `internetworking_model=tcfilter` and `internetworking_model=macvtap`. It works only # with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge # (like OVS) directly. # (default: false) #disable_new_netns = true # if enabled, the runtime will add all the kata processes inside one dedicated cgroup. # The container cgroups in the host are not created, just one single cgroup per sandbox. # The runtime caller is free to restrict or collect cgroup stats of the overall Kata sandbox. # The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation. # The sandbox cgroup is constrained if there is no container type annotation. # See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType sandbox_cgroup_only=false # If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In # this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful # when a hardware architecture or hypervisor solutions is utilized which does not support CPU and/or memory hotplug. # Compatibility for determining appropriate sandbox (VM) size: # - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O # does not yet support sandbox sizing annotations. # - When running single containers using a tool like ctr, container sizing information will be available. static_sandbox_resource_mgmt=false # If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path. # This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory. # If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts` # These will not be exposed to the container workloads, and are only provided for potential guest services. sandbox_bind_mounts=[] # VFIO Mode # Determines how VFIO devices should be be presented to the container. # Options: # # - vfio # Matches behaviour of OCI runtimes (e.g. runc) as much as # possible. VFIO devices will appear in the container as VFIO # character devices under /dev/vfio. The exact names may differ # from the host (they need to match the VM's IOMMU group numbers # rather than the host's) # # - guest-kernel # This is a Kata-specific behaviour that's useful in certain cases. # The VFIO device is managed by whatever driver in the VM kernel # claims it. This means it will appear as one or more device nodes # or network interfaces depending on the nature of the device. # Using this mode requires specially built workloads that know how # to locate the relevant device interfaces within the VM. # vfio_mode="guest-kernel" # If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will # be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest. disable_guest_empty_dir=false # Enabled experimental feature list, format: ["a", "b"]. # Experimental features are features not stable enough for production, # they may break compatibility, and are prepared for a big version bump. # Supported experimental features: # (default: []) experimental=[] # If enabled, user can run pprof tools with shim v2 process through kata-monitor. # (default: false) # enable_pprof = true # WARNING: All the options in the following section have not been implemented yet. # This section was added as a placeholder. DO NOT USE IT! [image] # Container image service. # # Offload the CRI image management service to the Kata agent. # (default: false) #service_offload = true # Container image decryption keys provisioning. # Applies only if service_offload is true. # Keys can be provisioned locally (e.g. through a special command or # a local file) or remotely (usually after the guest is remotely attested). # The provision setting is a complete URL that lets the Kata agent decide # which method to use in order to fetch the keys. # # Keys can be stored in a local file, in a measured and attested initrd: #provision=data:///local/key/file # # Keys could be fetched through a special command or binary from the # initrd (guest) image, e.g. a firmware call: #provision=file:///path/to/bin/fetcher/in/guest # # Keys can be remotely provisioned. The Kata agent fetches them from e.g. # a HTTPS URL: #provision=https://my-key-broker.foo/tenant/ ```

---

Containerd shim v2

Containerd shim v2 is `/usr/local/bin/containerd-shim-kata-v2`.

containerd-shim-kata-v2 --version

``` Kata Containers containerd shim (Golang): id: "io.containerd.kata.v2", version: 3.2.0-rc0, commit: 108db0a7210b392e8aec2781043dfbd8297f84b9 ```

---

KSM throttler

# KSM throttler ## version ## systemd service

Image details

# Image details ```yaml --- osbuilder: url: "https://github.com/kata-containers/kata-containers/tools/osbuilder" version: "unknown" rootfs-creation-time: "2023-10-11T07:48:31.970401484+0000Z" description: "osbuilder rootfs" file-format-version: "0.0.2" architecture: "x86_64" base-distro: name: "ubuntu" version: "focal" packages: default: - "chrony" - "dbus" - "init" - "iptables" - "libseccomp2" extra: agent: url: "https://github.com/kata-containers/kata-containers" name: "kata-agent" version: "3.2.0-rc0" agent-is-init-daemon: "no" ``` ---

Initrd details

# Initrd details No initrd ---

Logfiles

# Logfiles ## Runtime logs

Runtime logs

No recent runtime problems found in system journal.

## Throttler logs
Throttler logs

No recent throttler problems found in system journal.

## Kata Containerd Shim v2 logs
Kata Containerd Shim v2

Recent problems found in system journal: ``` time="2023-10-09T14:15:36.839906481+02:00" level=error msg="trace called before context set" name=Delete pid=172071 sandbox=test-kata source=containerd-kata-shim-v2 type=bug time="2023-10-09T14:15:36.840330467+02:00" level=error msg="trace called before context set" name=Shutdown pid=172071 sandbox=test-kata source=containerd-kata-shim-v2 type=bug time="2023-10-11T09:58:32.427171775+02:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=40650 sandbox=test-kata source=cgroups time="2023-10-11T09:58:32.524088575+02:00" level=error msg="qemu-system-x86_64: -chardev socket,id=char-1787fda16a6a92e0,path=/run/vc/vm/test-kata/vhost-fs.sock: Failed to connect to '/run/vc/vm/test-kata/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=40650 qemuPid=40661 sandbox=test-kata source=virtcontainers/hypervisor subsystem=qemu time="2023-10-12T08:23:05.422338158+02:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=42655 sandbox=test-kata2 source=cgroups time="2023-10-12T08:23:05.536401632+02:00" level=error msg="qemu-system-x86_64: -chardev socket,id=char-51a63738cd95d711,path=/run/vc/vm/test-kata2/vhost-fs.sock: Failed to connect to '/run/vc/vm/test-kata2/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=42655 qemuPid=42666 sandbox=test-kata2 source=virtcontainers/hypervisor subsystem=qemu time="2023-10-12T09:33:11.576825671+02:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=45138 sandbox=test-kata3 source=cgroups time="2023-10-12T09:33:11.698552399+02:00" level=error msg="qemu-system-x86_64: -chardev socket,id=char-03445fb326ed8277,path=/run/vc/vm/test-kata3/vhost-fs.sock: Failed to connect to '/run/vc/vm/test-kata3/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=45138 qemuPid=45149 sandbox=test-kata3 source=virtcontainers/hypervisor subsystem=qemu time="2023-10-12T11:55:51.447969838+02:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=47813 sandbox=hello source=cgroups time="2023-10-12T11:55:51.58536773+02:00" level=error msg="qemu-system-x86_64: -chardev socket,id=char-bfd6d81e788b5d47,path=/run/vc/vm/hello/vhost-fs.sock: Failed to connect to '/run/vc/vm/hello/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=47813 qemuPid=47824 sandbox=hello source=virtcontainers/hypervisor subsystem=qemu time="2023-10-12T12:16:39.228913447+02:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=48152 sandbox=hello1 source=cgroups time="2023-10-12T12:16:39.33479009+02:00" level=error msg="qemu-system-x86_64: -chardev socket,id=char-6730d53ce69a9be9,path=/run/vc/vm/hello1/vhost-fs.sock: Failed to connect to '/run/vc/vm/hello1/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=48152 qemuPid=48163 sandbox=hello1 source=virtcontainers/hypervisor subsystem=qemu time="2023-10-12T12:31:10.381006605+02:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=48501 sandbox=hello2 source=cgroups time="2023-10-12T12:31:10.489841551+02:00" level=error msg="qemu-system-x86_64: -chardev socket,id=char-0fc297e0bb185bb5,path=/run/vc/vm/hello2/vhost-fs.sock: Failed to connect to '/run/vc/vm/hello2/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=48501 qemuPid=48512 sandbox=hello2 source=virtcontainers/hypervisor subsystem=qemu time="2023-10-12T12:34:16.561362828+02:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=48725 sandbox=hell source=cgroups time="2023-10-12T12:34:16.698263329+02:00" level=error msg="qemu-system-x86_64: -chardev socket,id=char-e068a1a0253b890d,path=/run/vc/vm/hell/vhost-fs.sock: Failed to connect to '/run/vc/vm/hell/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=48725 qemuPid=48736 sandbox=hell source=virtcontainers/hypervisor subsystem=qemu time="2023-10-12T14:26:49.084447829+02:00" level=warning msg="Could not add /dev/mshv to the devices cgroup" name=containerd-shim-v2 pid=51724 sandbox=test1 source=cgroups time="2023-10-12T14:26:49.17497036+02:00" level=error msg="qemu-system-x86_64: -chardev socket,id=char-271eac1646f0483b,path=/run/vc/vm/test1/vhost-fs.sock: Failed to connect to '/run/vc/vm/test1/vhost-fs.sock': Connection refused" name=containerd-shim-v2 pid=51724 qemuPid=51735 sandbox=test1 source=virtcontainers/hypervisor subsystem=qemu ```

---

Container manager details

# Container manager details

Docker

## Docker

docker version

``` Client: Version: 20.10.23-ce API version: 1.41 Go version: go1.18.10 Git commit: 6051f1429 Built: Wed Feb 1 00:00:00 2023 OS/Arch: linux/amd64 Context: default Experimental: true Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? ```

docker info

``` Client: Context: default Debug Mode: false Server: ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? errors pretty printing info ```

systemctl show docker

``` Type=notify ExitType=main Restart=on-failure NotifyAccess=main RestartUSec=100ms TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s TimeoutAbortUSec=1min 30s TimeoutStartFailureMode=terminate TimeoutStopFailureMode=terminate RuntimeMaxUSec=infinity RuntimeRandomizedExtraUSec=0 WatchdogUSec=0 WatchdogTimestampMonotonic=0 RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=0 ControlPID=0 FileDescriptorStoreMax=0 NFileDescriptorStore=0 StatusErrno=0 Result=success ReloadResult=success CleanResult=success UID=[not set] GID=[not set] NRestarts=0 OOMPolicy=continue ExecMainStartTimestamp=Wed 2023-10-11 09:25:06 CEST ExecMainStartTimestampMonotonic=18207881669064 ExecMainExitTimestamp=Wed 2023-10-11 09:58:16 CEST ExecMainExitTimestampMonotonic=18209871588496 ExecMainPID=232835 ExecMainCode=1 ExecMainStatus=0 ExecStart={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd --add-runtime oci=/usr/sbin/docker-runc $DOCKER_NETWORK_OPTIONS $DOCKER_OPTS ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } ExecStartEx={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd --add-runtime oci=/usr/sbin/docker-runc $DOCKER_NETWORK_OPTIONS $DOCKER_OPTS ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } ExecReloadEx={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } Slice=system.slice ControlGroupId=0 MemoryCurrent=[not set] MemoryAvailable=infinity CPUUsageNSec=214125149000 TasksCurrent=[not set] IPIngressBytes=[no data] IPIngressPackets=[no data] IPEgressBytes=[no data] IPEgressPackets=[no data] IOReadBytes=18446744073709551615 IOReadOperations=18446744073709551615 IOWriteBytes=18446744073709551615 IOWriteOperations=18446744073709551615 Delegate=yes DelegateControllers=cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices bpf-foreign bpf-socket-bind bpf-restrict-network-interfaces CPUAccounting=yes CPUWeight=[not set] StartupCPUWeight=[not set] CPUShares=[not set] StartupCPUShares=[not set] CPUQuotaPerSecUSec=infinity CPUQuotaPeriodUSec=infinity IOAccounting=no IOWeight=[not set] StartupIOWeight=[not set] BlockIOAccounting=no BlockIOWeight=[not set] StartupBlockIOWeight=[not set] MemoryAccounting=no DefaultMemoryLow=0 DefaultMemoryMin=0 MemoryMin=0 MemoryLow=0 MemoryHigh=infinity MemoryMax=infinity MemorySwapMax=infinity MemoryLimit=infinity DevicePolicy=auto TasksAccounting=yes TasksMax=infinity IPAccounting=no ManagedOOMSwap=auto ManagedOOMMemoryPressure=auto ManagedOOMMemoryPressureLimit=0 ManagedOOMPreference=none EnvironmentFiles=/etc/sysconfig/docker (ignore_errors=no) UMask=0022 LimitCPU=infinity LimitCPUSoft=infinity LimitFSIZE=infinity LimitFSIZESoft=infinity LimitDATA=infinity LimitDATASoft=infinity LimitSTACK=infinity LimitSTACKSoft=8388608 LimitCORE=infinity LimitCORESoft=infinity LimitRSS=infinity LimitRSSSoft=infinity LimitNOFILE=1048576 LimitNOFILESoft=1048576 LimitAS=infinity LimitASSoft=infinity LimitNPROC=infinity LimitNPROCSoft=infinity LimitMEMLOCK=8388608 LimitMEMLOCKSoft=8388608 LimitLOCKS=infinity LimitLOCKSSoft=infinity LimitSIGPENDING=1026496 LimitSIGPENDINGSoft=1026496 LimitMSGQUEUE=819200 LimitMSGQUEUESoft=819200 LimitNICE=0 LimitNICESoft=0 LimitRTPRIO=0 LimitRTPRIOSoft=0 LimitRTTIME=infinity LimitRTTIMESoft=infinity OOMScoreAdjust=0 CoredumpFilter=0x33 Nice=0 IOSchedulingClass=2 IOSchedulingPriority=4 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 CPUAffinityFromNUMA=no NUMAPolicy=n/a TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SyslogLevel=6 SyslogFacility=3 LogLevelMax=-1 LogRateLimitIntervalUSec=0 LogRateLimitBurst=0 SecureBits=0 CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read cap_perfmon cap_bpf cap_checkpoint_restore DynamicUser=no RemoveIPC=no PrivateTmp=no PrivateDevices=no ProtectClock=no ProtectKernelTunables=no ProtectKernelModules=no ProtectKernelLogs=no ProtectControlGroups=no PrivateNetwork=no PrivateUsers=no PrivateMounts=no PrivateIPC=no ProtectHome=no ProtectSystem=no SameProcessGroup=no UtmpMode=init IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=2147483646 LockPersonality=no RuntimeDirectoryPreserve=no RuntimeDirectoryMode=0755 StateDirectoryMode=0755 CacheDirectoryMode=0755 LogsDirectoryMode=0755 ConfigurationDirectoryMode=0755 TimeoutCleanUSec=infinity MemoryDenyWriteExecute=no RestrictRealtime=no RestrictSUIDSGID=no RestrictNamespaces=no MountAPIVFS=no KeyringMode=private ProtectProc=default ProcSubset=all ProtectHostname=no KillMode=process KillSignal=15 RestartKillSignal=15 FinalKillSignal=9 SendSIGKILL=yes SendSIGHUP=no WatchdogSignal=6 Id=docker.service Names=docker.service Requires=sysinit.target system.slice Conflicts=shutdown.target ConflictedBy=containerd.service Before=shutdown.target After=basic.target system.slice firewalld.service network.target lvm2-monitor.service systemd-journald.socket sysinit.target Documentation=http://docs.docker.com Description=Docker Application Container Engine LoadState=loaded ActiveState=inactive FreezerState=running SubState=dead FragmentPath=/usr/lib/systemd/system/docker.service UnitFileState=disabled UnitFilePreset=disabled StateChangeTimestamp=Wed 2023-10-11 09:58:16 CEST StateChangeTimestampMonotonic=18209871588613 InactiveExitTimestamp=Wed 2023-10-11 09:25:06 CEST InactiveExitTimestampMonotonic=18207881669499 ActiveEnterTimestamp=Wed 2023-10-11 09:25:06 CEST ActiveEnterTimestampMonotonic=18207882372558 ActiveExitTimestamp=Wed 2023-10-11 09:58:15 CEST ActiveExitTimestampMonotonic=18209870569879 InactiveEnterTimestamp=Wed 2023-10-11 09:58:16 CEST InactiveEnterTimestampMonotonic=18209871588613 CanStart=yes CanStop=yes CanReload=yes CanIsolate=no CanFreeze=yes StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnSuccessJobMode=fail OnFailureJobMode=replace IgnoreOnIsolate=no NeedDaemonReload=no JobTimeoutUSec=infinity JobRunningTimeoutUSec=infinity JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Wed 2023-10-11 09:25:06 CEST ConditionTimestampMonotonic=18207881575858 AssertTimestamp=Wed 2023-10-11 09:25:06 CEST AssertTimestampMonotonic=18207881575861 Transient=no Perpetual=no StartLimitIntervalUSec=1min StartLimitBurst=3 StartLimitAction=none FailureAction=none SuccessAction=none InvocationID=fb9ec2bee91c462fb1ba94a3598f8373 CollectMode=inactive ```

containerd

## containerd

containerd --version

``` containerd github.com/containerd/containerd v1.7.6 091922f03c2762540fd057fba91260237ff86acb ```

systemctl show containerd

``` Type=notify ExitType=main Restart=always NotifyAccess=main RestartUSec=5s TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s TimeoutAbortUSec=1min 30s TimeoutStartFailureMode=terminate TimeoutStopFailureMode=terminate RuntimeMaxUSec=infinity RuntimeRandomizedExtraUSec=0 WatchdogUSec=0 WatchdogTimestampMonotonic=0 RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=51763 ControlPID=0 FileDescriptorStoreMax=0 NFileDescriptorStore=0 StatusErrno=0 Result=success ReloadResult=success CleanResult=success UID=[not set] GID=[not set] NRestarts=6 OOMPolicy=continue ExecMainStartTimestamp=Thu 2023-10-12 14:28:23 CEST ExecMainStartTimestampMonotonic=18312478900134 ExecMainExitTimestampMonotonic=0 ExecMainPID=51763 ExecMainCode=0 ExecMainStatus=0 ExecStartPre={ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; ignore_errors=yes ; start_time=[Thu 2023-10-12 14:28:23 CEST] ; stop_time=[Thu 2023-10-12 14:28:23 CEST] ; pid=51762 ; code=exited ; status=0 } ExecStartPreEx={ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; flags=ignore-failure ; start_time=[Thu 2023-10-12 14:28:23 CEST] ; stop_time=[Thu 2023-10-12 14:28:23 CEST] ; pid=51762 ; code=exited ; status=0 } ExecStart={ path=/usr/sbin/containerd ; argv[]=/usr/sbin/containerd ; ignore_errors=no ; start_time=[Thu 2023-10-12 14:28:23 CEST] ; stop_time=[n/a] ; pid=51763 ; code=(null) ; status=0/0 } ExecStartEx={ path=/usr/sbin/containerd ; argv[]=/usr/sbin/containerd ; flags= ; start_time=[Thu 2023-10-12 14:28:23 CEST] ; stop_time=[n/a] ; pid=51763 ; code=(null) ; status=0/0 } Slice=system.slice ControlGroup=/system.slice/containerd.service ControlGroupId=477625 MemoryCurrent=[not set] MemoryAvailable=infinity CPUUsageNSec=700043000 EffectiveCPUs=0-255 EffectiveMemoryNodes=0-1 TasksCurrent=26 IPIngressBytes=[no data] IPIngressPackets=[no data] IPEgressBytes=[no data] IPEgressPackets=[no data] IOReadBytes=18446744073709551615 IOReadOperations=18446744073709551615 IOWriteBytes=18446744073709551615 IOWriteOperations=18446744073709551615 Delegate=yes DelegateControllers=cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices bpf-foreign bpf-socket-bind bpf-restrict-network-interfaces CPUAccounting=yes CPUWeight=[not set] StartupCPUWeight=[not set] CPUShares=[not set] StartupCPUShares=[not set] CPUQuotaPerSecUSec=infinity CPUQuotaPeriodUSec=infinity IOAccounting=no IOWeight=[not set] StartupIOWeight=[not set] BlockIOAccounting=no BlockIOWeight=[not set] StartupBlockIOWeight=[not set] MemoryAccounting=no DefaultMemoryLow=0 DefaultMemoryMin=0 MemoryMin=0 MemoryLow=0 MemoryHigh=infinity MemoryMax=infinity MemorySwapMax=infinity MemoryLimit=infinity DevicePolicy=auto TasksAccounting=yes TasksMax=infinity IPAccounting=no ManagedOOMSwap=auto ManagedOOMMemoryPressure=auto ManagedOOMMemoryPressureLimit=0 ManagedOOMPreference=none UMask=0022 LimitCPU=infinity LimitCPUSoft=infinity LimitFSIZE=infinity LimitFSIZESoft=infinity LimitDATA=infinity LimitDATASoft=infinity LimitSTACK=infinity LimitSTACKSoft=8388608 LimitCORE=infinity LimitCORESoft=infinity LimitRSS=infinity LimitRSSSoft=infinity LimitNOFILE=1048576 LimitNOFILESoft=1048576 LimitAS=infinity LimitASSoft=infinity LimitNPROC=infinity LimitNPROCSoft=infinity LimitMEMLOCK=8388608 LimitMEMLOCKSoft=8388608 LimitLOCKS=infinity LimitLOCKSSoft=infinity LimitSIGPENDING=1026496 LimitSIGPENDINGSoft=1026496 LimitMSGQUEUE=819200 LimitMSGQUEUESoft=819200 LimitNICE=0 LimitNICESoft=0 LimitRTPRIO=0 LimitRTPRIOSoft=0 LimitRTTIME=infinity LimitRTTIMESoft=infinity OOMScoreAdjust=-999 CoredumpFilter=0x33 Nice=0 IOSchedulingClass=2 IOSchedulingPriority=4 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 CPUAffinityFromNUMA=no NUMAPolicy=n/a TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SyslogLevel=6 SyslogFacility=3 LogLevelMax=-1 LogRateLimitIntervalUSec=0 LogRateLimitBurst=0 SecureBits=0 CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read cap_perfmon cap_bpf cap_checkpoint_restore DynamicUser=no RemoveIPC=no PrivateTmp=no PrivateDevices=no ProtectClock=no ProtectKernelTunables=no ProtectKernelModules=no ProtectKernelLogs=no ProtectControlGroups=no PrivateNetwork=no PrivateUsers=no PrivateMounts=no PrivateIPC=no ProtectHome=no ProtectSystem=no SameProcessGroup=no UtmpMode=init IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=2147483646 LockPersonality=no RuntimeDirectoryPreserve=no RuntimeDirectoryMode=0755 StateDirectoryMode=0755 CacheDirectoryMode=0755 LogsDirectoryMode=0755 ConfigurationDirectoryMode=0755 TimeoutCleanUSec=infinity MemoryDenyWriteExecute=no RestrictRealtime=no RestrictSUIDSGID=no RestrictNamespaces=no MountAPIVFS=no KeyringMode=private ProtectProc=default ProcSubset=all ProtectHostname=no KillMode=process KillSignal=15 RestartKillSignal=15 FinalKillSignal=9 SendSIGKILL=yes SendSIGHUP=no WatchdogSignal=6 Id=containerd.service Names=containerd.service Requires=sysinit.target system.slice Conflicts=docker.service shutdown.target Before=shutdown.target After=basic.target local-fs.target sysinit.target system.slice systemd-journald.socket network.target Documentation=https://containerd.io Description=containerd container runtime LoadState=loaded ActiveState=active FreezerState=running SubState=running FragmentPath=/usr/lib/systemd/system/containerd.service UnitFileState=disabled UnitFilePreset=disabled StateChangeTimestamp=Thu 2023-10-12 14:28:23 CEST StateChangeTimestampMonotonic=18312478995183 InactiveExitTimestamp=Thu 2023-10-12 14:28:23 CEST InactiveExitTimestampMonotonic=18312478893337 ActiveEnterTimestamp=Thu 2023-10-12 14:28:23 CEST ActiveEnterTimestampMonotonic=18312478995183 ActiveExitTimestamp=Thu 2023-10-12 14:28:18 CEST ActiveExitTimestampMonotonic=18312473754025 InactiveEnterTimestamp=Thu 2023-10-12 14:28:23 CEST InactiveEnterTimestampMonotonic=18312478844630 CanStart=yes CanStop=yes CanReload=no CanIsolate=no CanFreeze=yes StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnSuccessJobMode=fail OnFailureJobMode=replace IgnoreOnIsolate=no NeedDaemonReload=no JobTimeoutUSec=infinity JobRunningTimeoutUSec=infinity JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Thu 2023-10-12 14:28:23 CEST ConditionTimestampMonotonic=18312478844754 AssertTimestamp=Thu 2023-10-12 14:28:23 CEST AssertTimestampMonotonic=18312478844759 Transient=no Perpetual=no StartLimitIntervalUSec=10s StartLimitBurst=5 StartLimitAction=none FailureAction=none SuccessAction=none InvocationID=fb55c00832d04e06875e6daa0932ce5f CollectMode=inactive ```

cat /etc/containerd/config.toml

```toml # See containerd-config.toml(5) for documentation. [plugins."io.containerd.grpc.v1.cri".containerd] snapshotter = "overlayfs" default_runtime_name = "runc" no_pivot = false [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime] runtime_type = "" runtime_engine = "" runtime_root = "" privileged_without_host_devices = false [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime] runtime_type = "" runtime_engine = "" runtime_root = "" privileged_without_host_devices = false [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] runtime_type = "io.containerd.runc.v2" runtime_engine = "" runtime_root = "" privileged_without_host_devices = false [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] BinaryName = "" CriuImagePath = "" CriuPath = "" CriuWorkPath = "" IoGid = 0 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata] runtime_type = "io.containerd.kata.v2" pod_annotations = ["io.kata-containers.*"] privileged_without_host_devices = true container_annotations = ["io.katacontainers.*"] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options] ConfigPath = "/etc/kata-containers/configuration.toml" [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/opt/cni/bin" conf_dir = "/etc/cni/net.d" max_conf_num = 1 conf_template = "" ```

Podman

## Podman

podman --version

``` podman version 4.3.1 ```

podman system info

``` host: arch: amd64 buildahVersion: 1.28.0 cgroupControllers: - cpuset - cpu - io - memory - hugetlb - pids - rdma - misc cgroupManager: systemd cgroupVersion: v2 conmon: package: conmon-2.1.5-2.1.x86_64 path: /usr/bin/conmon version: 'conmon version 2.1.5, commit: unknown' cpuUtilization: idlePercent: 99.96 systemPercent: 0.01 userPercent: 0.03 cpus: 256 distribution: distribution: '"opensuse-tumbleweed"' version: "20230215" eventLogger: journald hostname: milan.arch.suse.cz idMappings: gidmap: null uidmap: null kernel: 6.1.0-rc4-snp-v8-svsm-host+ linkmode: dynamic logDriver: journald memFree: 178591383552 memTotal: 269119557632 networkBackend: cni ociRuntime: name: runc package: runc-1.1.4-2.1.x86_64 path: /usr/bin/runc version: |- runc version 1.1.4 commit: v1.1.4-0-ga916309fff0f spec: 1.0.2-dev go: go1.18.6 libseccomp: 2.5.4 os: linux remoteSocket: path: /run/podman/podman.sock security: apparmorEnabled: true capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: false seccompEnabled: true seccompProfilePath: /etc/containers/seccomp.json selinuxEnabled: false serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: slirp4netns-1.2.0-1.1.x86_64 version: |- slirp4netns version 1.2.0 commit: unknown libslirp: 4.7.0 SLIRP_CONFIG_VERSION_MAX: 5 libseccomp: 2.5.4 swapFree: 2147811328 swapTotal: 2147811328 uptime: 5086h 55m 14.00s (Approximately 211.92 days) plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan - ipvlan volume: - local registries: search: - registry.opensuse.org - docker.io store: configFile: /etc/containers/storage.conf containerStore: number: 0 paused: 0 running: 0 stopped: 0 graphDriverName: btrfs graphOptions: {} graphRoot: /var/lib/containers/storage graphRootAllocated: 253374758912 graphRootUsed: 249936683008 graphStatus: Build Version: Btrfs v6.1.2 Library Version: "102" imageCopyTmpDir: /var/tmp imageStore: number: 0 runRoot: /var/run/containers/storage volumePath: /var/lib/containers/storage/volumes version: APIVersion: 4.3.1 Built: 1673913600 BuiltTime: Tue Jan 17 01:00:00 2023 GitCommit: "" GoVersion: go1.17.13 Os: linux OsArch: linux/amd64 Version: 4.3.1 ```

cat /etc/containers/mounts.conf

``` # This configuration file specifies the default mounts for each container of the # tools adhering to this file (e.g., CRI-O, Podman, Buildah). The format of the # config is /SRC:/DST, one mount per line. ```

cat /etc/containers/registries.conf

``` # For more information on this configuration file, see containers-registries.conf(5). # # NOTE: RISK OF USING UNQUALIFIED IMAGE NAMES # We recommend always using fully qualified image names including the registry # server (full dns name), namespace, image name, and tag # (e.g., registry.opensuse.org/opensuse/tumbleweed:latest). Pulling by digest (i.e., # registry.opensuse.org/project/name@digest) further eliminates the ambiguity of tags. # When using short names, there is always an inherent risk that the image being # pulled could be spoofed. For example, a user wants to pull an image named # `foobar` from a registry and expects it to come from myregistry.com. If # myregistry.com is not first in the search list, an attacker could place a # different `foobar` image at a registry earlier in the search list. The user # would accidentally pull and run the attacker's image and code rather than the # intended content. We recommend only adding registries which are completely # trusted (i.e., registries which don't allow unknown or anonymous users to # create accounts with arbitrary names). This will prevent an image from being # spoofed, squatted or otherwise made insecure. If it is necessary to use one # of these registries, it should be added at the end of the list. # # # An array of host[:port] registries to try when pulling an unqualified image, in order. unqualified-search-registries = ["registry.opensuse.org", "docker.io"] # # [[registry]] # # The "prefix" field is used to choose the relevant [[registry]] TOML table; # # (only) the TOML table with the longest match for the input image name # # (taking into account namespace/repo/tag/digest separators) is used. # # # # The prefix can also be of the form: *.example.com for wildcard subdomain # # matching. # # # # If the prefix field is missing, it defaults to be the same as the "location" field. # prefix = "example.com/foo" # # # If true, unencrypted HTTP as well as TLS connections with untrusted # # certificates are allowed. # insecure = false # # # If true, pulling images with matching names is forbidden. # blocked = false # # # The physical location of the "prefix"-rooted namespace. # # # # By default, this is equal to "prefix" (in which case "prefix" can be omitted # # and the [[registry]] TOML table can only specify "location"). # # # # Example: Given # # prefix = "example.com/foo" # # location = "internal-registry-for-example.net/bar" # # requests for the image example.com/foo/myimage:latest will actually work with the # # internal-registry-for-example.net/bar/myimage:latest image. # # # The location can be empty iff prefix is in a # # wildcarded format: "*.example.com". In this case, the input reference will # # be used as-is without any rewrite. # location = internal-registry-for-example.com/bar" # # # (Possibly-partial) mirrors for the "prefix"-rooted namespace. # # # # The mirrors are attempted in the specified order; the first one that can be # # contacted and contains the image will be used (and if none of the mirrors contains the image, # # the primary location specified by the "registry.location" field, or using the unmodified # # user-specified reference, is tried last). # # # # Each TOML table in the "mirror" array can contain the following fields, with the same semantics # # as if specified in the [[registry]] TOML table directly: # # - location # # - insecure # [[registry.mirror]] # location = "example-mirror-0.local/mirror-for-foo" # [[registry.mirror]] # location = "example-mirror-1.local/mirrors/foo" # insecure = true # # Given the above, a pull of example.com/foo/image:latest will try: # # 1. example-mirror-0.local/mirror-for-foo/image:latest # # 2. example-mirror-1.local/mirrors/foo/image:latest # # 3. internal-registry-for-example.net/bar/image:latest # # in order, and use the first one that exists. ```

cat /etc/containers/storage.conf

``` # This file is is the configuration file for all tools # that use the containers/storage library. # See man 5 containers-storage.conf for more information # The "container storage" table contains all of the server options. [storage] # Default Storage Driver driver = "btrfs" # Temporary storage location runroot = "/var/run/containers/storage" # Primary Read/Write location of container storage graphroot = "/var/lib/containers/storage" [storage.options] # Storage options to be passed to underlying storage drivers # AdditionalImageStores is used to pass paths to additional Read/Only image stores # Must be comma separated list. additionalimagestores = [ ] # Size is used to set a maximum size of the container image. Only supported by # certain container storage drivers. size = "" # Path to an helper program to use for mounting the file system instead of mounting it # directly. #mount_program = "/usr/bin/fuse-overlayfs" # OverrideKernelCheck tells the driver to ignore kernel checks based on kernel version # override_kernel_check = "false" # mountopt specifies comma separated list of extra mount options # mountopt = "nodev" # Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of # a container, to UIDs/GIDs as they should appear outside of the container, and # the length of the range of UIDs/GIDs. Additional mapped sets can be listed # and will be heeded by libraries, but there are limits to the number of # mappings which the kernel will allow when you later attempt to run a # container. # # remap-uids = 0:1668442479:65536 # remap-gids = 0:1668442479:65536 # Remap-User/Group is a name which can be used to look up one or more UID/GID # ranges in the /etc/subuid or /etc/subgid file. Mappings are set up starting # with an in-container ID of 0 and the a host-level ID taken from the lowest # range that matches the specified name, and using the length of that range. # Additional ranges are then assigned, using the ranges which specify the # lowest host-level IDs first, to the lowest not-yet-mapped container-level ID, # until all of the entries have been used for maps. # # remap-user = "storage" # remap-group = "storage" # If specified, use OSTree to deduplicate files with the overlay backend # ######ostree_repo = "" # Set to skip a PRIVATE bind mount on the storage home directory. Only supported by # certain container storage drivers # skip_mount_home = "false" [storage.options.thinpool] # Storage Options for thinpool # autoextend_percent determines the amount by which pool needs to be # grown. This is specified in terms of % of pool size. So a value of 20 means # that when threshold is hit, pool will be grown by 20% of existing # pool size. # autoextend_percent = "20" # autoextend_threshold determines the pool extension threshold in terms # of percentage of pool size. For example, if threshold is 60, that means when # pool is 60% full, threshold has been hit. # autoextend_threshold = "80" # basesize specifies the size to use when creating the base device, which # limits the size of images and containers. # basesize = "10G" # blocksize specifies a custom blocksize to use for the thin pool. # blocksize="64k" # directlvm_device specifies a custom block storage device to use for the # thin pool. Required if you setup devicemapper. # directlvm_device = "" # directlvm_device_force wipes device even if device already has a filesystem. # directlvm_device_force = "True" # fs specifies the filesystem type to use for the base device. # fs="xfs" # log_level sets the log level of devicemapper. # 0: LogLevelSuppress 0 (Default) # 2: LogLevelFatal # 3: LogLevelErr # 4: LogLevelWarn # 5: LogLevelNotice # 6: LogLevelInfo # 7: LogLevelDebug # log_level = "7" # min_free_space specifies the min free space percent in a thin pool require for # new device creation to succeed. Valid values are from 0% - 99%. # Value 0% disables # min_free_space = "10%" # mkfsarg specifies extra mkfs arguments to be used when creating the base. # device. # mkfsarg = "" # use_deferred_removal marks devicemapper block device for deferred removal. # If the thinpool is in use when the driver attempts to remove it, the driver # tells the kernel to remove it as soon as possible. Note this does not free # up the disk space, use deferred deletion to fully remove the thinpool. # use_deferred_removal = "True" # use_deferred_deletion marks thinpool device for deferred deletion. # If the device is busy when the driver attempts to delete it, the driver # will attempt to delete device every 30 seconds until successful. # If the program using the driver exits, the driver will continue attempting # to cleanup the next time the driver is used. Deferred deletion permanently # deletes the device and all data stored in device will be lost. # use_deferred_deletion = "True" # xfs_nospace_max_retries specifies the maximum number of retries XFS should # attempt to complete IO when ENOSPC (no space) error is returned by # underlying storage device. # xfs_nospace_max_retries = "0" ```

cat /etc/containers/policy.json

``` { "default": [ { "type": "insecureAcceptAnything" } ], "transports": { "docker-daemon": { "": [{"type":"insecureAcceptAnything"}] } } } ```

cat /etc/containers/seccomp.json

``` { "defaultAction": "SCMP_ACT_ERRNO", "defaultErrnoRet": 38, "defaultErrno": "ENOSYS", "archMap": [ { "architecture": "SCMP_ARCH_X86_64", "subArchitectures": [ "SCMP_ARCH_X86", "SCMP_ARCH_X32" ] }, { "architecture": "SCMP_ARCH_AARCH64", "subArchitectures": [ "SCMP_ARCH_ARM" ] }, { "architecture": "SCMP_ARCH_MIPS64", "subArchitectures": [ "SCMP_ARCH_MIPS", "SCMP_ARCH_MIPS64N32" ] }, { "architecture": "SCMP_ARCH_MIPS64N32", "subArchitectures": [ "SCMP_ARCH_MIPS", "SCMP_ARCH_MIPS64" ] }, { "architecture": "SCMP_ARCH_MIPSEL64", "subArchitectures": [ "SCMP_ARCH_MIPSEL", "SCMP_ARCH_MIPSEL64N32" ] }, { "architecture": "SCMP_ARCH_MIPSEL64N32", "subArchitectures": [ "SCMP_ARCH_MIPSEL", "SCMP_ARCH_MIPSEL64" ] }, { "architecture": "SCMP_ARCH_S390X", "subArchitectures": [ "SCMP_ARCH_S390" ] } ], "syscalls": [ { "names": [ "bdflush", "io_pgetevents", "kexec_file_load", "kexec_load", "migrate_pages", "move_pages", "nfsservctl", "nice", "oldfstat", "oldlstat", "oldolduname", "oldstat", "olduname", "pciconfig_iobase", "pciconfig_read", "pciconfig_write", "sgetmask", "ssetmask", "swapcontext", "swapoff", "swapon", "sysfs", "uselib", "userfaultfd", "ustat", "vm86", "vm86old", "vmsplice" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": {}, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "_llseek", "_newselect", "accept", "accept4", "access", "adjtimex", "alarm", "bind", "brk", "capget", "capset", "chdir", "chmod", "chown", "chown32", "clock_adjtime", "clock_adjtime64", "clock_getres", "clock_getres_time64", "clock_gettime", "clock_gettime64", "clock_nanosleep", "clock_nanosleep_time64", "clone", "clone3", "close", "close_range", "connect", "copy_file_range", "creat", "dup", "dup2", "dup3", "epoll_create", "epoll_create1", "epoll_ctl", "epoll_ctl_old", "epoll_pwait", "epoll_pwait2", "epoll_wait", "epoll_wait_old", "eventfd", "eventfd2", "execve", "execveat", "exit", "exit_group", "faccessat", "faccessat2", "fadvise64", "fadvise64_64", "fallocate", "fanotify_mark", "fchdir", "fchmod", "fchmodat", "fchown", "fchown32", "fchownat", "fcntl", "fcntl64", "fdatasync", "fgetxattr", "flistxattr", "flock", "fork", "fremovexattr", "fsconfig", "fsetxattr", "fsmount", "fsopen", "fspick", "fstat", "fstat64", "fstatat64", "fstatfs", "fstatfs64", "fsync", "ftruncate", "ftruncate64", "futex", "futex_time64", "futimesat", "get_mempolicy", "get_robust_list", "get_thread_area", "getcpu", "getcwd", "getdents", "getdents64", "getegid", "getegid32", "geteuid", "geteuid32", "getgid", "getgid32", "getgroups", "getgroups32", "getitimer", "getpeername", "getpgid", "getpgrp", "getpid", "getppid", "getpriority", "getrandom", "getresgid", "getresgid32", "getresuid", "getresuid32", "getrlimit", "getrusage", "getsid", "getsockname", "getsockopt", "gettid", "gettimeofday", "getuid", "getuid32", "getxattr", "inotify_add_watch", "inotify_init", "inotify_init1", "inotify_rm_watch", "io_cancel", "io_destroy", "io_getevents", "io_setup", "io_submit", "ioctl", "ioprio_get", "ioprio_set", "ipc", "keyctl", "kill", "landlock_add_rule", "landlock_create_ruleset", "landlock_restrict_self", "lchown", "lchown32", "lgetxattr", "link", "linkat", "listen", "listxattr", "llistxattr", "lremovexattr", "lseek", "lsetxattr", "lstat", "lstat64", "madvise", "mbind", "membarrier", "memfd_create", "memfd_secret", "mincore", "mkdir", "mkdirat", "mknod", "mknodat", "mlock", "mlock2", "mlockall", "mmap", "mmap2", "mount", "mount_setattr", "move_mount", "mprotect", "mq_getsetattr", "mq_notify", "mq_open", "mq_timedreceive", "mq_timedreceive_time64", "mq_timedsend", "mq_timedsend_time64", "mq_unlink", "mremap", "msgctl", "msgget", "msgrcv", "msgsnd", "msync", "munlock", "munlockall", "munmap", "name_to_handle_at", "nanosleep", "newfstatat", "open", "open_tree", "openat", "openat2", "pause", "pidfd_getfd", "pidfd_open", "pidfd_send_signal", "pipe", "pipe2", "pivot_root", "pkey_alloc", "pkey_free", "pkey_mprotect", "poll", "ppoll", "ppoll_time64", "prctl", "pread64", "preadv", "preadv2", "prlimit64", "process_mrelease", "process_vm_readv", "process_vm_writev", "pselect6", "pselect6_time64", "ptrace", "pwrite64", "pwritev", "pwritev2", "read", "readahead", "readdir", "readlink", "readlinkat", "readv", "reboot", "recv", "recvfrom", "recvmmsg", "recvmmsg_time64", "recvmsg", "remap_file_pages", "removexattr", "rename", "renameat", "renameat2", "restart_syscall", "rmdir", "rseq", "rt_sigaction", "rt_sigpending", "rt_sigprocmask", "rt_sigqueueinfo", "rt_sigreturn", "rt_sigsuspend", "rt_sigtimedwait", "rt_sigtimedwait_time64", "rt_tgsigqueueinfo", "sched_get_priority_max", "sched_get_priority_min", "sched_getaffinity", "sched_getattr", "sched_getparam", "sched_getscheduler", "sched_rr_get_interval", "sched_rr_get_interval_time64", "sched_setaffinity", "sched_setattr", "sched_setparam", "sched_setscheduler", "sched_yield", "seccomp", "select", "semctl", "semget", "semop", "semtimedop", "semtimedop_time64", "send", "sendfile", "sendfile64", "sendmmsg", "sendmsg", "sendto", "set_mempolicy", "set_robust_list", "set_thread_area", "set_tid_address", "setfsgid", "setfsgid32", "setfsuid", "setfsuid32", "setgid", "setgid32", "setgroups", "setgroups32", "setitimer", "setns", "setpgid", "setpriority", "setregid", "setregid32", "setresgid", "setresgid32", "setresuid", "setresuid32", "setreuid", "setreuid32", "setrlimit", "setsid", "setsockopt", "setuid", "setuid32", "setxattr", "shmat", "shmctl", "shmdt", "shmget", "shutdown", "sigaction", "sigaltstack", "signal", "signalfd", "signalfd4", "sigpending", "sigprocmask", "sigreturn", "sigsuspend", "socketcall", "socketpair", "splice", "stat", "stat64", "statfs", "statfs64", "statx", "symlink", "symlinkat", "sync", "sync_file_range", "syncfs", "syscall", "sysinfo", "syslog", "tee", "tgkill", "time", "timer_create", "timer_delete", "timer_getoverrun", "timer_gettime", "timer_gettime64", "timer_settime", "timer_settime64", "timerfd", "timerfd_create", "timerfd_gettime", "timerfd_gettime64", "timerfd_settime", "timerfd_settime64", "times", "tkill", "truncate", "truncate64", "ugetrlimit", "umask", "umount", "umount2", "uname", "unlink", "unlinkat", "unshare", "utime", "utimensat", "utimensat_time64", "utimes", "vfork", "wait4", "waitid", "waitpid", "write", "writev" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 0, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 8, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131072, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131080, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 4294967295, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "sync_file_range2" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "ppc64le" ] }, "excludes": {} }, { "names": [ "arm_fadvise64_64", "arm_sync_file_range", "breakpoint", "cacheflush", "set_tls", "sync_file_range2" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "arm", "arm64" ] }, "excludes": {} }, { "names": [ "arch_prctl" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "amd64", "x32" ] }, "excludes": {} }, { "names": [ "modify_ldt" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "amd64", "x32", "x86" ] }, "excludes": {} }, { "names": [ "s390_pci_mmio_read", "s390_pci_mmio_write", "s390_runtime_instr" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "s390", "s390x" ] }, "excludes": {} }, { "names": [ "open_by_handle_at" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_DAC_READ_SEARCH" ] }, "excludes": {} }, { "names": [ "open_by_handle_at" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_DAC_READ_SEARCH" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "bpf", "fanotify_init", "lookup_dcookie", "perf_event_open", "quotactl", "setdomainname", "sethostname", "setns" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_ADMIN" ] }, "excludes": {} }, { "names": [ "bpf", "fanotify_init", "lookup_dcookie", "perf_event_open", "quotactl", "setdomainname", "sethostname", "setns" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_ADMIN" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "chroot" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_CHROOT" ] }, "excludes": {} }, { "names": [ "chroot" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_CHROOT" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "delete_module", "finit_module", "init_module", "query_module" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_MODULE" ] }, "excludes": {} }, { "names": [ "delete_module", "finit_module", "init_module", "query_module" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_MODULE" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "acct" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_PACCT" ] }, "excludes": {} }, { "names": [ "acct" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_PACCT" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "kcmp", "process_madvise" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_PTRACE" ] }, "excludes": {} }, { "names": [ "kcmp", "process_madvise" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_PTRACE" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "ioperm", "iopl" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_RAWIO" ] }, "excludes": {} }, { "names": [ "ioperm", "iopl" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_RAWIO" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "clock_settime", "clock_settime64", "settimeofday", "stime" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_TIME" ] }, "excludes": {} }, { "names": [ "clock_settime", "clock_settime64", "settimeofday", "stime" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_TIME" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "vhangup" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_TTY_CONFIG" ] }, "excludes": {} }, { "names": [ "vhangup" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_TTY_CONFIG" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "socket" ], "action": "SCMP_ACT_ERRNO", "args": [ { "index": 0, "value": 16, "valueTwo": 0, "op": "SCMP_CMP_EQ" }, { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] }, "errnoRet": 22, "errno": "EINVAL" }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 16, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": null, "comment": "", "includes": { "caps": [ "CAP_AUDIT_WRITE" ] }, "excludes": {} } ] } ```

cat /usr/share/containers/containers.conf

``` # The containers configuration file specifies all of the available configuration # command-line options/flags for container engine tools like Podman & Buildah, # but in a TOML format that can be easily modified and versioned. # Please refer to containers.conf(5) for details of all configuration options. # Not all container engines implement all of the options. # All of the options have hard coded defaults and these options will override # the built in defaults. Users can then override these options via the command # line. Container engines will read containers.conf files in up to three # locations in the following order: # 1. /usr/share/containers/containers.conf # 2. /etc/containers/containers.conf # 3. $HOME/.config/containers/containers.conf (Rootless containers ONLY) # Items specified in the latter containers.conf, if they exist, override the # previous containers.conf settings, or the default settings. [containers] # List of annotation. Specified as # "key = value" # If it is empty or commented out, no annotations will be added # #annotations = [] # Used to change the name of the default AppArmor profile of container engine. # #apparmor_profile = "container-default" # The hosts entries from the base hosts file are added to the containers hosts # file. This must be either an absolute path or as special values "image" which # uses the hosts file from the container image or "none" which means # no base hosts file is used. The default is "" which will use /etc/hosts. # #base_hosts_file = "" # Default way to to create a cgroup namespace for the container # Options are: # `private` Create private Cgroup Namespace for the container. # `host` Share host Cgroup Namespace with the container. # #cgroupns = "private" # Control container cgroup configuration # Determines whether the container will create CGroups. # Options are: # `enabled` Enable cgroup support within container # `disabled` Disable cgroup support, will inherit cgroups from parent # `no-conmon` Do not create a cgroup dedicated to conmon. # #cgroups = "enabled" # List of default capabilities for containers. If it is empty or commented out, # the default capabilities defined in the container engine will be added. # default_capabilities = [ "CHOWN", "DAC_OVERRIDE", "FOWNER", "FSETID", "KILL", "NET_BIND_SERVICE", "SETFCAP", "SETGID", "SETPCAP", "SETUID", "SYS_CHROOT" ] # A list of sysctls to be set in containers by default, # specified as "name=value", # for example:"net.ipv4.ping_group_range=0 0". # default_sysctls = [ "net.ipv4.ping_group_range=0 0", ] # A list of ulimits to be set in containers by default, specified as # "=:", for example: # "nofile=1024:2048" # See setrlimit(2) for a list of resource names. # Any limit not specified here will be inherited from the process launching the # container engine. # Ulimits has limits for non privileged container engines. # #default_ulimits = [ # "nofile=1280:2560", #] # List of devices. Specified as # "::", for example: # "/dev/sdc:/dev/xvdc:rwm". # If it is empty or commented out, only the default devices will be used # #devices = [] # List of default DNS options to be added to /etc/resolv.conf inside of the container. # #dns_options = [] # List of default DNS search domains to be added to /etc/resolv.conf inside of the container. # #dns_searches = [] # Set default DNS servers. # This option can be used to override the DNS configuration passed to the # container. The special value "none" can be specified to disable creation of # /etc/resolv.conf in the container. # The /etc/resolv.conf file in the image will be used without changes. # #dns_servers = [] # Environment variable list for the conmon process; used for passing necessary # environment variables to conmon or the runtime. # #env = [ # "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", # "TERM=xterm", #] # Pass all host environment variables into the container. # #env_host = false # Set the ip for the host.containers.internal entry in the containers /etc/hosts # file. This can be set to "none" to disable adding this entry. By default it # will automatically choose the host ip. # # NOTE: When using podman machine this entry will never be added to the containers # hosts file instead the gvproxy dns resolver will resolve this hostname. Therefore # it is not possible to disable the entry in this case. # #host_containers_internal_ip = "" # Default proxy environment variables passed into the container. # The environment variables passed in include: # http_proxy, https_proxy, ftp_proxy, no_proxy, and the upper case versions of # these. This option is needed when host system uses a proxy but container # should not use proxy. Proxy environment variables specified for the container # in any other way will override the values passed from the host. # #http_proxy = true # Run an init inside the container that forwards signals and reaps processes. # #init = false # Container init binary, if init=true, this is the init binary to be used for containers. # init_path = "/usr/bin/catatonit" # Default way to to create an IPC namespace (POSIX SysV IPC) for the container # Options are: # "host" Share host IPC Namespace with the container. # "none" Create shareable IPC Namespace for the container without a private /dev/shm. # "private" Create private IPC Namespace for the container, other containers are not allowed to share it. # "shareable" Create shareable IPC Namespace for the container. # #ipcns = "shareable" # keyring tells the container engine whether to create # a kernel keyring for use within the container. # #keyring = true # label tells the container engine whether to use container separation using # MAC(SELinux) labeling or not. # The label flag is ignored on label disabled systems. # #label = true # Logging driver for the container. Available options: k8s-file and journald. # log_driver = "journald" # Maximum size allowed for the container log file. Negative numbers indicate # that no size limit is imposed. If positive, it must be >= 8192 to match or # exceed conmon's read buffer. The file is truncated and re-opened so the # limit is never exceeded. # #log_size_max = -1 # Specifies default format tag for container log messages. # This is useful for creating a specific tag for container log messages. # Containers logs default to truncated container ID as a tag. # #log_tag = "" # Default way to to create a Network namespace for the container # Options are: # `private` Create private Network Namespace for the container. # `host` Share host Network Namespace with the container. # `none` Containers do not use the network # #netns = "private" # Create /etc/hosts for the container. By default, container engine manage # /etc/hosts, automatically adding the container's own IP address. # #no_hosts = false # Default way to to create a PID namespace for the container # Options are: # `private` Create private PID Namespace for the container. # `host` Share host PID Namespace with the container. # #pidns = "private" # Maximum number of processes allowed in a container. # #pids_limit = 2048 # Copy the content from the underlying image into the newly created volume # when the container is created instead of when it is started. If false, # the container engine will not copy the content until the container is started. # Setting it to true may have negative performance implications. # #prepare_volume_on_create = false # Path to the seccomp.json profile which is used as the default seccomp profile # for the runtime. # #seccomp_profile = "/usr/share/containers/seccomp.json" # Size of /dev/shm. Specified as . # Unit is optional, values: # b (bytes), k (kilobytes), m (megabytes), or g (gigabytes). # If the unit is omitted, the system uses bytes. # #shm_size = "65536k" # Set timezone in container. Takes IANA timezones as well as "local", # which sets the timezone in the container to match the host machine. # #tz = "" # Set umask inside the container # #umask = "0022" # Default way to to create a User namespace for the container # Options are: # `auto` Create unique User Namespace for the container. # `host` Share host User Namespace with the container. # #userns = "host" # Number of UIDs to allocate for the automatic container creation. # UIDs are allocated from the "container" UIDs listed in # /etc/subuid & /etc/subgid # #userns_size = 65536 # Default way to to create a UTS namespace for the container # Options are: # `private` Create private UTS Namespace for the container. # `host` Share host UTS Namespace with the container. # #utsns = "private" # List of volumes. Specified as # "::", for example: # "/db:/var/lib/db:ro". # If it is empty or commented out, no volumes will be added # #volumes = [] [secrets] #driver = "file" [secrets.opts] #root = "/example/directory" [network] # Network backend determines what network driver will be used to set up and tear down container networks. # Valid values are "cni" and "netavark". # The default value is empty which means that it will automatically choose CNI or netavark. If there are # already containers/images or CNI networks preset it will choose CNI. # # Before changing this value all containers must be stopped otherwise it is likely that # iptables rules and network interfaces might leak on the host. A reboot will fix this. # #network_backend = "" # Path to directory where CNI plugin binaries are located. # cni_plugin_dirs = ["/usr/libexec/cni"] # The network name of the default network to attach pods to. # #default_network = "podman" # The default subnet for the default network given in default_network. # If a network with that name does not exist, a new network using that name and # this subnet will be created. # Must be a valid IPv4 CIDR prefix. # #default_subnet = "10.88.0.0/16" # DefaultSubnetPools is a list of subnets and size which are used to # allocate subnets automatically for podman network create. # It will iterate through the list and will pick the first free subnet # with the given size. This is only used for ipv4 subnets, ipv6 subnets # are always assigned randomly. # #default_subnet_pools = [ # {"base" = "10.89.0.0/16", "size" = 24}, # {"base" = "10.90.0.0/15", "size" = 24}, # {"base" = "10.92.0.0/14", "size" = 24}, # {"base" = "10.96.0.0/11", "size" = 24}, # {"base" = "10.128.0.0/9", "size" = 24}, #] # Path to the directory where network configuration files are located. # For the CNI backend the default is "/etc/cni/net.d" as root # and "$HOME/.config/cni/net.d" as rootless. # For the netavark backend "/etc/containers/networks" is used as root # and "$graphroot/networks" as rootless. # #network_config_dir = "/etc/cni/net.d/" # Port to use for dns forwarding daemon with netavark in rootful bridge # mode and dns enabled. # Using an alternate port might be useful if other dns services should # run on the machine. # #dns_bind_port = 53 [engine] # Index to the active service # #active_service = production # The compression format to use when pushing an image. # Valid options are: `gzip`, `zstd` and `zstd:chunked`. # #compression_format = "gzip" # Cgroup management implementation used for the runtime. # Valid options "systemd" or "cgroupfs" # #cgroup_manager = "systemd" # Environment variables to pass into conmon # #conmon_env_vars = [ # "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" #] # Paths to look for the conmon container manager binary # #conmon_path = [ # "/usr/libexec/podman/conmon", # "/usr/local/libexec/podman/conmon", # "/usr/local/lib/podman/conmon", # "/usr/bin/conmon", # "/usr/sbin/conmon", # "/usr/local/bin/conmon", # "/usr/local/sbin/conmon" #] # Enforces using docker.io for completing short names in Podman's compatibility # REST API. Note that this will ignore unqualified-search-registries and # short-name aliases defined in containers-registries.conf(5). #compat_api_enforce_docker_hub = true # Specify the keys sequence used to detach a container. # Format is a single character [a-Z] or a comma separated sequence of # `ctrl-`, where `` is one of: # `a-z`, `@`, `^`, `[`, `\`, `]`, `^` or `_` # #detach_keys = "ctrl-p,ctrl-q" # Determines whether engine will reserve ports on the host when they are # forwarded to containers. When enabled, when ports are forwarded to containers, # ports are held open by as long as the container is running, ensuring that # they cannot be reused by other programs on the host. However, this can cause # significant memory usage if a container has many ports forwarded to it. # Disabling this can save memory. # #enable_port_reservation = true # Environment variables to be used when running the container engine (e.g., Podman, Buildah). # For example "http_proxy=internal.proxy.company.com". # Note these environment variables will not be used within the container. # Set the env section under [containers] table, if you want to set environment variables for the container. # #env = [] # Define where event logs will be stored, when events_logger is "file". #events_logfile_path="" # Sets the maximum size for events_logfile_path. # The size can be b (bytes), k (kilobytes), m (megabytes), or g (gigabytes). # The format for the size is ``, e.g., `1b` or `3g`. # If no unit is included then the size will be read in bytes. # When the limit is exceeded, the logfile will be rotated and the old one will be deleted. # If the maximum size is set to 0, then no limit will be applied, # and the logfile will not be rotated. #events_logfile_max_size = "1m" # Selects which logging mechanism to use for container engine events. # Valid values are `journald`, `file` and `none`. # #events_logger = "journald" # A is a list of directories which are used to search for helper binaries. # #helper_binaries_dir = [ # "/usr/local/libexec/podman", # "/usr/local/lib/podman", # "/usr/libexec/podman", # "/usr/lib/podman", #] # Path to OCI hooks directories for automatically executed hooks. # #hooks_dir = [ # "/usr/share/containers/oci/hooks.d", #] # Manifest Type (oci, v2s2, or v2s1) to use when pulling, pushing, building # container images. By default image pulled and pushed match the format of the # source image. Building/committing defaults to OCI. # #image_default_format = "" # Default transport method for pulling and pushing for images # #image_default_transport = "docker://" # Maximum number of image layers to be copied (pulled/pushed) simultaneously. # Not setting this field, or setting it to zero, will fall back to containers/image defaults. # #image_parallel_copies = 0 # Tells container engines how to handle the builtin image volumes. # * bind: An anonymous named volume will be created and mounted # into the container. # * tmpfs: The volume is mounted onto the container as a tmpfs, # which allows users to create content that disappears when # the container is stopped. # * ignore: All volumes are just ignored and no action is taken. # #image_volume_mode = "" # Default command to run the infra container # #infra_command = "/pause" # Infra (pause) container image name for pod infra containers. When running a # pod, we start a `pause` process in a container to hold open the namespaces # associated with the pod. This container does nothing other then sleep, # reserving the pods resources for the lifetime of the pod. By default container # engines run a builtin container using the pause executable. If you want override # specify an image to pull. # #infra_image = "" # Specify the locking mechanism to use; valid values are "shm" and "file". # Change the default only if you are sure of what you are doing, in general # "file" is useful only on platforms where cgo is not available for using the # faster "shm" lock type. You may need to run "podman system renumber" after # you change the lock type. # #lock_type** = "shm" # MultiImageArchive - if true, the container engine allows for storing archives # (e.g., of the docker-archive transport) with multiple images. By default, # Podman creates single-image archives. # #multi_image_archive = "false" # Default engine namespace # If engine is joined to a namespace, it will see only containers and pods # that were created in the same namespace, and will create new containers and # pods in that namespace. # The default namespace is "", which corresponds to no namespace. When no # namespace is set, all containers and pods are visible. # #namespace = "" # Path to the slirp4netns binary # #network_cmd_path = "" # Default options to pass to the slirp4netns binary. # Valid options values are: # # - allow_host_loopback=true|false: Allow the slirp4netns to reach the host loopback IP (`10.0.2.2`). # Default is false. # - mtu=MTU: Specify the MTU to use for this network. (Default is `65520`). # - cidr=CIDR: Specify ip range to use for this network. (Default is `10.0.2.0/24`). # - enable_ipv6=true|false: Enable IPv6. Default is true. (Required for `outbound_addr6`). # - outbound_addr=INTERFACE: Specify the outbound interface slirp should bind to (ipv4 traffic only). # - outbound_addr=IPv4: Specify the outbound ipv4 address slirp should bind to. # - outbound_addr6=INTERFACE: Specify the outbound interface slirp should bind to (ipv6 traffic only). # - outbound_addr6=IPv6: Specify the outbound ipv6 address slirp should bind to. # - port_handler=rootlesskit: Use rootlesskit for port forwarding. Default. # Note: Rootlesskit changes the source IP address of incoming packets to a IP address in the container # network namespace, usually `10.0.2.100`. If your application requires the real source IP address, # e.g. web server logs, use the slirp4netns port handler. The rootlesskit port handler is also used for # rootless containers when connected to user-defined networks. # - port_handler=slirp4netns: Use the slirp4netns port forwarding, it is slower than rootlesskit but # preserves the correct source IP address. This port handler cannot be used for user-defined networks. # #network_cmd_options = [] # Whether to use chroot instead of pivot_root in the runtime # #no_pivot_root = false # Number of locks available for containers and pods. # If this is changed, a lock renumber must be performed (e.g. with the # 'podman system renumber' command). # #num_locks = 2048 # Set the exit policy of the pod when the last container exits. #pod_exit_policy = "continue" # Whether to pull new image before running a container # #pull_policy = "missing" # Indicates whether the application should be running in remote mode. This flag modifies the # --remote option on container engines. Setting the flag to true will default # `podman --remote=true` for access to the remote Podman service. # #remote = false # Default OCI runtime # #runtime = "crun" # List of the OCI runtimes that support --format=json. When json is supported # engine will use it for reporting nicer errors. # #runtime_supports_json = ["crun", "runc", "kata", "runsc", "krun"] # List of the OCI runtimes that supports running containers with KVM Separation. # #runtime_supports_kvm = ["kata", "krun"] # List of the OCI runtimes that supports running containers without cgroups. # #runtime_supports_nocgroups = ["crun", "krun"] # Default location for storing temporary container image content. Can be overridden with the TMPDIR environment # variable. If you specify "storage", then the location of the # container/storage tmp directory will be used. # image_copy_tmp_dir="/var/tmp" # Number of seconds to wait without a connection # before the `podman system service` times out and exits # #service_timeout = 5 # Directory for persistent engine files (database, etc) # By default, this will be configured relative to where the containers/storage # stores containers # Uncomment to change location from this default # #static_dir = "/var/lib/containers/storage/libpod" # Number of seconds to wait for container to exit before sending kill signal. # #stop_timeout = 10 # Number of seconds to wait before exit command in API process is given to. # This mimics Docker's exec cleanup behaviour, where the default is 5 minutes (value is in seconds). # #exit_command_delay = 300 # map of service destinations # #[service_destinations] # [service_destinations.production] # URI to access the Podman service # Examples: # rootless "unix://run/user/$UID/podman/podman.sock" (Default) # rootful "unix://run/podman/podman.sock (Default) # remote rootless ssh://engineering.lab.company.com/run/user/1000/podman/podman.sock # remote rootful ssh://root@10.10.1.136:22/run/podman/podman.sock # # uri = "ssh://user@production.example.com/run/user/1001/podman/podman.sock" # Path to file containing ssh identity key # identity = "~/.ssh/id_rsa" # Directory for temporary files. Must be tmpfs (wiped after reboot) # #tmp_dir = "/run/libpod" # Directory for libpod named volumes. # By default, this will be configured relative to where containers/storage # stores containers. # Uncomment to change location from this default. # #volume_path = "/var/lib/containers/storage/volumes" # Paths to look for a valid OCI runtime (crun, runc, kata, runsc, krun, etc) [engine.runtimes] #crun = [ # "/usr/bin/crun", # "/usr/sbin/crun", # "/usr/local/bin/crun", # "/usr/local/sbin/crun", # "/sbin/crun", # "/bin/crun", # "/run/current-system/sw/bin/crun", #] #kata = [ # "/usr/bin/kata-runtime", # "/usr/sbin/kata-runtime", # "/usr/local/bin/kata-runtime", # "/usr/local/sbin/kata-runtime", # "/sbin/kata-runtime", # "/bin/kata-runtime", # "/usr/bin/kata-qemu", # "/usr/bin/kata-fc", #] #runc = [ # "/usr/bin/runc", # "/usr/sbin/runc", # "/usr/local/bin/runc", # "/usr/local/sbin/runc", # "/sbin/runc", # "/bin/runc", # "/usr/lib/cri-o-runc/sbin/runc", #] #runsc = [ # "/usr/bin/runsc", # "/usr/sbin/runsc", # "/usr/local/bin/runsc", # "/usr/local/sbin/runsc", # "/bin/runsc", # "/sbin/runsc", # "/run/current-system/sw/bin/runsc", #] #krun = [ # "/usr/bin/krun", # "/usr/local/bin/krun", #] [engine.volume_plugins] #testplugin = "/run/podman/plugins/test.sock" [machine] # Number of CPU's a machine is created with. # #cpus=1 # The size of the disk in GB created when init-ing a podman-machine VM. # #disk_size=10 # The image used when creating a podman-machine VM. # #image = "testing" # Memory in MB a machine is created with. # #memory=2048 # The username to use and create on the podman machine OS for rootless # container access. # #user = "core" # Host directories to be mounted as volumes into the VM by default. # Environment variables like $HOME as well as complete paths are supported for # the source and destination. An optional third field `:ro` can be used to # tell the container engines to mount the volume readonly. # # volumes = [ # "$HOME:$HOME", #] # The [machine] table MUST be the last entry in this file. # (Unless another table is added) # TOML does not provide a way to end a table other than a further table being # defined, so every key hereafter will be part of [machine] and not the # main config. ```

cat /usr/share/containers/mounts.conf

``` # This configuration file specifies the default mounts for each container of the # tools adhering to this file (e.g., CRI-O, Podman, Buildah). The format of the # config is /SRC:/DST, one mount per line. ```

cat /usr/share/containers/seccomp.json

``` { "defaultAction": "SCMP_ACT_ERRNO", "defaultErrnoRet": 38, "defaultErrno": "ENOSYS", "archMap": [ { "architecture": "SCMP_ARCH_X86_64", "subArchitectures": [ "SCMP_ARCH_X86", "SCMP_ARCH_X32" ] }, { "architecture": "SCMP_ARCH_AARCH64", "subArchitectures": [ "SCMP_ARCH_ARM" ] }, { "architecture": "SCMP_ARCH_MIPS64", "subArchitectures": [ "SCMP_ARCH_MIPS", "SCMP_ARCH_MIPS64N32" ] }, { "architecture": "SCMP_ARCH_MIPS64N32", "subArchitectures": [ "SCMP_ARCH_MIPS", "SCMP_ARCH_MIPS64" ] }, { "architecture": "SCMP_ARCH_MIPSEL64", "subArchitectures": [ "SCMP_ARCH_MIPSEL", "SCMP_ARCH_MIPSEL64N32" ] }, { "architecture": "SCMP_ARCH_MIPSEL64N32", "subArchitectures": [ "SCMP_ARCH_MIPSEL", "SCMP_ARCH_MIPSEL64" ] }, { "architecture": "SCMP_ARCH_S390X", "subArchitectures": [ "SCMP_ARCH_S390" ] } ], "syscalls": [ { "names": [ "bdflush", "io_pgetevents", "kexec_file_load", "kexec_load", "migrate_pages", "move_pages", "nfsservctl", "nice", "oldfstat", "oldlstat", "oldolduname", "oldstat", "olduname", "pciconfig_iobase", "pciconfig_read", "pciconfig_write", "sgetmask", "ssetmask", "swapcontext", "swapoff", "swapon", "sysfs", "uselib", "userfaultfd", "ustat", "vm86", "vm86old", "vmsplice" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": {}, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "_llseek", "_newselect", "accept", "accept4", "access", "adjtimex", "alarm", "bind", "brk", "capget", "capset", "chdir", "chmod", "chown", "chown32", "clock_adjtime", "clock_adjtime64", "clock_getres", "clock_getres_time64", "clock_gettime", "clock_gettime64", "clock_nanosleep", "clock_nanosleep_time64", "clone", "clone3", "close", "close_range", "connect", "copy_file_range", "creat", "dup", "dup2", "dup3", "epoll_create", "epoll_create1", "epoll_ctl", "epoll_ctl_old", "epoll_pwait", "epoll_pwait2", "epoll_wait", "epoll_wait_old", "eventfd", "eventfd2", "execve", "execveat", "exit", "exit_group", "faccessat", "faccessat2", "fadvise64", "fadvise64_64", "fallocate", "fanotify_mark", "fchdir", "fchmod", "fchmodat", "fchown", "fchown32", "fchownat", "fcntl", "fcntl64", "fdatasync", "fgetxattr", "flistxattr", "flock", "fork", "fremovexattr", "fsconfig", "fsetxattr", "fsmount", "fsopen", "fspick", "fstat", "fstat64", "fstatat64", "fstatfs", "fstatfs64", "fsync", "ftruncate", "ftruncate64", "futex", "futex_time64", "futimesat", "get_mempolicy", "get_robust_list", "get_thread_area", "getcpu", "getcwd", "getdents", "getdents64", "getegid", "getegid32", "geteuid", "geteuid32", "getgid", "getgid32", "getgroups", "getgroups32", "getitimer", "getpeername", "getpgid", "getpgrp", "getpid", "getppid", "getpriority", "getrandom", "getresgid", "getresgid32", "getresuid", "getresuid32", "getrlimit", "getrusage", "getsid", "getsockname", "getsockopt", "gettid", "gettimeofday", "getuid", "getuid32", "getxattr", "inotify_add_watch", "inotify_init", "inotify_init1", "inotify_rm_watch", "io_cancel", "io_destroy", "io_getevents", "io_setup", "io_submit", "ioctl", "ioprio_get", "ioprio_set", "ipc", "keyctl", "kill", "landlock_add_rule", "landlock_create_ruleset", "landlock_restrict_self", "lchown", "lchown32", "lgetxattr", "link", "linkat", "listen", "listxattr", "llistxattr", "lremovexattr", "lseek", "lsetxattr", "lstat", "lstat64", "madvise", "mbind", "membarrier", "memfd_create", "memfd_secret", "mincore", "mkdir", "mkdirat", "mknod", "mknodat", "mlock", "mlock2", "mlockall", "mmap", "mmap2", "mount", "mount_setattr", "move_mount", "mprotect", "mq_getsetattr", "mq_notify", "mq_open", "mq_timedreceive", "mq_timedreceive_time64", "mq_timedsend", "mq_timedsend_time64", "mq_unlink", "mremap", "msgctl", "msgget", "msgrcv", "msgsnd", "msync", "munlock", "munlockall", "munmap", "name_to_handle_at", "nanosleep", "newfstatat", "open", "open_tree", "openat", "openat2", "pause", "pidfd_getfd", "pidfd_open", "pidfd_send_signal", "pipe", "pipe2", "pivot_root", "pkey_alloc", "pkey_free", "pkey_mprotect", "poll", "ppoll", "ppoll_time64", "prctl", "pread64", "preadv", "preadv2", "prlimit64", "process_mrelease", "process_vm_readv", "process_vm_writev", "pselect6", "pselect6_time64", "ptrace", "pwrite64", "pwritev", "pwritev2", "read", "readahead", "readdir", "readlink", "readlinkat", "readv", "reboot", "recv", "recvfrom", "recvmmsg", "recvmmsg_time64", "recvmsg", "remap_file_pages", "removexattr", "rename", "renameat", "renameat2", "restart_syscall", "rmdir", "rseq", "rt_sigaction", "rt_sigpending", "rt_sigprocmask", "rt_sigqueueinfo", "rt_sigreturn", "rt_sigsuspend", "rt_sigtimedwait", "rt_sigtimedwait_time64", "rt_tgsigqueueinfo", "sched_get_priority_max", "sched_get_priority_min", "sched_getaffinity", "sched_getattr", "sched_getparam", "sched_getscheduler", "sched_rr_get_interval", "sched_rr_get_interval_time64", "sched_setaffinity", "sched_setattr", "sched_setparam", "sched_setscheduler", "sched_yield", "seccomp", "select", "semctl", "semget", "semop", "semtimedop", "semtimedop_time64", "send", "sendfile", "sendfile64", "sendmmsg", "sendmsg", "sendto", "set_mempolicy", "set_robust_list", "set_thread_area", "set_tid_address", "setfsgid", "setfsgid32", "setfsuid", "setfsuid32", "setgid", "setgid32", "setgroups", "setgroups32", "setitimer", "setns", "setpgid", "setpriority", "setregid", "setregid32", "setresgid", "setresgid32", "setresuid", "setresuid32", "setreuid", "setreuid32", "setrlimit", "setsid", "setsockopt", "setuid", "setuid32", "setxattr", "shmat", "shmctl", "shmdt", "shmget", "shutdown", "sigaction", "sigaltstack", "signal", "signalfd", "signalfd4", "sigpending", "sigprocmask", "sigreturn", "sigsuspend", "socketcall", "socketpair", "splice", "stat", "stat64", "statfs", "statfs64", "statx", "symlink", "symlinkat", "sync", "sync_file_range", "syncfs", "syscall", "sysinfo", "syslog", "tee", "tgkill", "time", "timer_create", "timer_delete", "timer_getoverrun", "timer_gettime", "timer_gettime64", "timer_settime", "timer_settime64", "timerfd", "timerfd_create", "timerfd_gettime", "timerfd_gettime64", "timerfd_settime", "timerfd_settime64", "times", "tkill", "truncate", "truncate64", "ugetrlimit", "umask", "umount", "umount2", "uname", "unlink", "unlinkat", "unshare", "utime", "utimensat", "utimensat_time64", "utimes", "vfork", "wait4", "waitid", "waitpid", "write", "writev" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 0, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 8, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131072, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131080, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 4294967295, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": {} }, { "names": [ "sync_file_range2" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "ppc64le" ] }, "excludes": {} }, { "names": [ "arm_fadvise64_64", "arm_sync_file_range", "breakpoint", "cacheflush", "set_tls", "sync_file_range2" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "arm", "arm64" ] }, "excludes": {} }, { "names": [ "arch_prctl" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "amd64", "x32" ] }, "excludes": {} }, { "names": [ "modify_ldt" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "amd64", "x32", "x86" ] }, "excludes": {} }, { "names": [ "s390_pci_mmio_read", "s390_pci_mmio_write", "s390_runtime_instr" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "arches": [ "s390", "s390x" ] }, "excludes": {} }, { "names": [ "open_by_handle_at" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_DAC_READ_SEARCH" ] }, "excludes": {} }, { "names": [ "open_by_handle_at" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_DAC_READ_SEARCH" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "bpf", "fanotify_init", "lookup_dcookie", "perf_event_open", "quotactl", "setdomainname", "sethostname", "setns" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_ADMIN" ] }, "excludes": {} }, { "names": [ "bpf", "fanotify_init", "lookup_dcookie", "perf_event_open", "quotactl", "setdomainname", "sethostname", "setns" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_ADMIN" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "chroot" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_CHROOT" ] }, "excludes": {} }, { "names": [ "chroot" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_CHROOT" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "delete_module", "finit_module", "init_module", "query_module" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_MODULE" ] }, "excludes": {} }, { "names": [ "delete_module", "finit_module", "init_module", "query_module" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_MODULE" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "acct" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_PACCT" ] }, "excludes": {} }, { "names": [ "acct" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_PACCT" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "kcmp", "process_madvise" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_PTRACE" ] }, "excludes": {} }, { "names": [ "kcmp", "process_madvise" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_PTRACE" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "ioperm", "iopl" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_RAWIO" ] }, "excludes": {} }, { "names": [ "ioperm", "iopl" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_RAWIO" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "clock_settime", "clock_settime64", "settimeofday", "stime" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_TIME" ] }, "excludes": {} }, { "names": [ "clock_settime", "clock_settime64", "settimeofday", "stime" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_TIME" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "vhangup" ], "action": "SCMP_ACT_ALLOW", "args": [], "comment": "", "includes": { "caps": [ "CAP_SYS_TTY_CONFIG" ] }, "excludes": {} }, { "names": [ "vhangup" ], "action": "SCMP_ACT_ERRNO", "args": [], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_SYS_TTY_CONFIG" ] }, "errnoRet": 1, "errno": "EPERM" }, { "names": [ "socket" ], "action": "SCMP_ACT_ERRNO", "args": [ { "index": 0, "value": 16, "valueTwo": 0, "op": "SCMP_CMP_EQ" }, { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_EQ" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] }, "errnoRet": 22, "errno": "EINVAL" }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 16, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 2, "value": 9, "valueTwo": 0, "op": "SCMP_CMP_NE" } ], "comment": "", "includes": {}, "excludes": { "caps": [ "CAP_AUDIT_WRITE" ] } }, { "names": [ "socket" ], "action": "SCMP_ACT_ALLOW", "args": null, "comment": "", "includes": { "caps": [ "CAP_AUDIT_WRITE" ] }, "excludes": {} } ] } ```

---

Packages

# Packages No `dpkg` Have `rpm`

rpm -qa|egrep "(cc-oci-runtime|cc-runtime|runv|kata-runtime|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"

``` qemu-8.1.0-2.2.x86_64 qemu-hw-display-qxl-8.1.0-2.2.x86_64 qemu-accel-qtest-8.1.0-2.2.x86_64 qemu-block-nfs-8.1.0-2.2.x86_64 qemu-x86-8.1.0-2.2.x86_64 qemu-ui-curses-8.1.0-2.2.x86_64 qemu-audio-spice-8.1.0-2.2.x86_64 qemu-ui-gtk-8.1.0-2.2.x86_64 qemu-microvm-7.1.0-11.1.noarch qemu-chardev-spice-8.1.0-2.2.x86_64 qemu-ui-spice-app-8.1.0-2.2.x86_64 qemu-hw-usb-smartcard-8.1.0-2.2.x86_64 qemu-hw-usb-host-8.1.0-2.2.x86_64 qemu-hw-display-virtio-gpu-8.1.0-2.2.x86_64 qemu-hw-display-virtio-gpu-pci-8.1.0-2.2.x86_64 qemu-vgabios-1.16.0_0_gd239552-11.1.noarch qemu-ui-spice-core-8.1.0-2.2.x86_64 qemu-tools-8.1.0-2.2.x86_64 qemu-ui-opengl-8.1.0-2.2.x86_64 qemu-ovmf-x86_64-202211-4.1.noarch libvirt-daemon-qemu-9.0.0-2.1.x86_64 qemu-hw-usb-redirect-8.1.0-2.2.x86_64 qemu-accel-tcg-x86-8.1.0-2.2.x86_64 qemu-hw-display-virtio-vga-8.1.0-2.2.x86_64 system-user-qemu-20170617-24.11.noarch libvirt-daemon-driver-qemu-9.0.0-2.1.x86_64 qemu-seabios-1.16.0_0_gd239552-11.1.noarch qemu-block-curl-8.1.0-2.2.x86_64 qemu-uefi-aarch64-202302-1.1.noarch qemu-kvm-8.1.0-2.2.x86_64 qemu-pr-helper-8.1.0-2.2.x86_64 qemu-img-8.1.0-2.2.x86_64 qemu-block-rbd-8.1.0-2.2.x86_64 qemu-arm-8.1.0-2.2.x86_64 qemu-ipxe-1.0.0+-11.1.noarch libvirt-client-qemu-9.0.0-2.1.x86_64 qemu-ksm-7.1.0-11.1.x86_64 ```

---

Kata Monitor

Kata Monitor `kata-monitor`.

kata-monitor --version

``` kata-monitor Version: 0.3.0 Go version: go1.21.1 Git commit: 108db0a7210b392e8aec2781043dfbd8297f84b9 OS/Arch: linux/amd64 ```

---

jinteng123 commented 9 months ago

"Has the problem been solved? How was it resolved?"

l8huang commented 4 months ago

I run into the same issue:

virtiofsd[2807080]: Error entering sandbox: Fork(Os { code: 38, kind: Unsupported, message: "Function not implemented" })

In my case the root cause is virtiofsd made a system call which not implemented(I was using Rocky Linux release 8.8 + kernel 4.18.0)

pidfd_open(779737, 0) = -1 ENOSYS (Function not implemented)

pidfd_open() added by https://github.com/torvalds/linux/commit/32fcb426ec001cb6d5a4a195091a8486ea77e2df and included in kernel v5.10.

@jinteng123 you can try a higher version kernel to see if this problem go away.

similar issue: https://github.com/kata-containers/kata-containers/issues/7740