kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

Attaching Physical NICs to Kata-Containers #1876

Closed Yuval-Ai closed 4 years ago

Yuval-Ai commented 5 years ago

Description of problem

Need to route (heavy) networking traffic through a container, and wish to use kata-containers. Currently can't use any accelerations (such as SR-IOV, DPDK, VPP etc.).

Can't manage to attach physical NICs to kata-containers, and can't find any instructions on how to do so.

Tried to do so in various ways, but came out empty-handed each time:

1. Using kata-runtime

a. run a kata-container

# docker run -itd --name my_con --runtime kata-runtime ubuntu bash

b. Created an interface JSON file (real info removed)

{
"device":"mytapname",
"name":"xxxxxxxx",
"IPAddresses":[{"address":"xx.xx.xx.xx","mask":"24"}],
"mtu":1500,
"hwAddr":"XX:XX:XX:XX:XX:XX"
}

c. added the interface to the container:

 # kata-runtime kata-network add-iface <container-ID> ./interface-file

Expected result

To see the interface inside the container. For example when typing:

 # ip link

Actual result

Received output:

null
PhysicalEndpoint does not support Hot attach

In this case, I would really appreciate even a simple documentation / "how to" guide / working example for using the kata-network add-iface. That would be extremely helpful..

2. Using pipeworks

# pipework --direct-phys enp0s20f1 -i eth1 my_con 20.20.20.1/24

Actual result

no new interfaces inside the kata-container

3. Using docker network

# docker network create my_net
# docker network connect my_net my_con

Actual result

No new interfaces inside the kata-container

Also tried

Testing the three methods mentioned above while:


I would really appreciate any help in this matter. Thanks!!


output of kata-collect-data.sh:

Show kata-collect-data.sh details

# Meta details Running `kata-collect-data.sh` version `1.8.0-alpha2 (commit 2abe2eb303ec30a041e79b84bd30c0861e1e731c)` at `2019-07-10.18:29:07.648734297+0300`. --- Runtime is `/bin/kata-runtime`. # `kata-env` Output of "`/bin/kata-runtime kata-env`": ```toml [Meta] Version = "1.0.23" [Runtime] Debug = false Trace = false DisableGuestSeccomp = true DisableNewNetNs = false Path = "/usr/bin/kata-runtime" [Runtime.Version] Semver = "1.8.0-alpha2" Commit = "2abe2eb303ec30a041e79b84bd30c0861e1e731c" OCI = "1.0.1-dev" [Runtime.Config] Path = "/usr/share/defaults/kata-containers/configuration.toml" [Hypervisor] MachineType = "pc" Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers" Path = "/usr/bin/qemu-lite-system-x86_64" BlockDeviceDriver = "virtio-scsi" EntropySource = "/dev/urandom" Msize9p = 8192 MemorySlots = 10 Debug = false UseVSock = false SharedFS = "virtio-9p" [Image] Path = "/usr/share/kata-containers/kata-containers-image_clearlinux_1.8.0-alpha2_agent_e40d7749dc.img" [Kernel] Path = "/usr/share/kata-containers/vmlinuz-4.19.28.42-48.1.container" Parameters = "init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket" [Initrd] Path = "" [Proxy] Type = "kataProxy" Version = "kata-proxy version 1.8.0-alpha2-8d16905" Path = "/usr/libexec/kata-containers/kata-proxy" Debug = false [Shim] Type = "kataShim" Version = "kata-shim version 1.8.0-alpha2-9424be7" Path = "/usr/libexec/kata-containers/kata-shim" Debug = false [Agent] Type = "kata" Debug = false Trace = false TraceMode = "" TraceType = "" [Host] Kernel = "3.10.0-327.el7.x86_64" Architecture = "amd64" VMContainerCapable = true SupportVSocks = false [Host.Distro] Name = "CentOS Linux" Version = "7" [Host.CPU] Vendor = "GenuineIntel" Model = "Intel(R) Atom(TM) CPU C2758 @ 2.40GHz" [Netmon] Version = "kata-netmon version 1.8.0-alpha2" Path = "/usr/libexec/kata-containers/kata-netmon" Debug = false Enable = false ``` --- # Runtime config files ## Runtime default config files ``` /etc/kata-containers/configuration.toml /usr/share/defaults/kata-containers/configuration.toml ``` ## Runtime config file contents Config file `/etc/kata-containers/configuration.toml` not found Output of "`cat "/usr/share/defaults/kata-containers/configuration.toml"`": ```toml # Copyright (c) 2017-2019 Intel Corporation # # SPDX-License-Identifier: Apache-2.0 # # XXX: WARNING: this file is auto-generated. # XXX: # XXX: Source file: "cli/config/configuration-qemu.toml.in" # XXX: Project: # XXX: Name: Kata Containers # XXX: Type: kata [hypervisor.qemu] path = "/usr/bin/qemu-lite-system-x86_64" kernel = "/usr/share/kata-containers/vmlinuz.container" image = "/usr/share/kata-containers/kata-containers.img" machine_type = "pc" # Optional space-separated list of options to pass to the guest kernel. # For example, use `kernel_params = "vsyscall=emulate"` if you are having # trouble running pre-2.15 glibc. # # WARNING: - any parameter specified here will take priority over the default # parameter value of the same name used to start the virtual machine. # Do not set values here unless you understand the impact of doing so as you # may stop the virtual machine from booting. # To see the list of default parameters, enable hypervisor debug, create a # container and look for 'default-kernel-parameters' log entries. kernel_params = "" # Path to the firmware. # If you want that qemu uses the default firmware leave this option empty firmware = "" # Machine accelerators # comma-separated list of machine accelerators to pass to the hypervisor. # For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"` machine_accelerators="" # Default number of vCPUs per SB/VM: # unspecified or 0 --> will be set to 1 # < 0 --> will be set to the actual number of physical cores # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores default_vcpus = 1 # Default maximum number of vCPUs per SB/VM: # unspecified or == 0 --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # > 0 <= number of physical cores --> will be set to the specified number # > number of physical cores --> will be set to the actual number of physical cores or to the maximum number # of vCPUs supported by KVM if that number is exceeded # WARNING: Depending of the architecture, the maximum number of vCPUs supported by KVM is used when # the actual number of physical cores is greater than it. # WARNING: Be aware that this value impacts the virtual machine's memory footprint and CPU # the hotplug functionality. For example, `default_maxvcpus = 240` specifies that until 240 vCPUs # can be added to a SB/VM, but the memory footprint will be big. Another example, with # `default_maxvcpus = 8` the memory footprint will be small, but 8 will be the maximum number of # vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable, # unless you know what are you doing. default_maxvcpus = 0 # Bridges can be used to hot plug devices. # Limitations: # * Currently only pci bridges are supported # * Until 30 devices per bridge can be hot plugged. # * Until 5 PCI bridges can be cold plugged per VM. # This limitation could be a bug in qemu or in the kernel # Default number of bridges per SB/VM: # unspecified or 0 --> will be set to 1 # > 1 <= 5 --> will be set to the specified number # > 5 --> will be set to 5 default_bridges = 1 # Default memory size in MiB for SB/VM. # If unspecified then it will be set 2048 MiB. default_memory = 2048 # # Default memory slots per SB/VM. # If unspecified then it will be set 10. # This is will determine the times that memory will be hotadded to sandbox/VM. #memory_slots = 10 # The size in MiB will be plused to max memory of hypervisor. # It is the memory address space for the NVDIMM devie. # If set block storage driver (block_device_driver) to "nvdimm", # should set memory_offset to the size of block device. # Default 0 #memory_offset = 0 # Disable block device from being used for a container's rootfs. # In case of a storage driver like devicemapper where a container's # root file system is backed by a block device, the block device is passed # directly to the hypervisor for performance reasons. # This flag prevents the block device from being passed to the hypervisor, # 9pfs is used instead to pass the rootfs. disable_block_device_use = false # Shared file system type: # - virtio-9p (default) # - virtio-fs shared_fs = "virtio-9p" # Path to vhost-user-fs daemon. virtio_fs_daemon = "/usr/bin/virtiofsd-x86_64" # Default size of DAX cache in MiB virtio_fs_cache_size = 1024 # Cache mode: # # - none # Metadata, data, and pathname lookup are not cached in guest. They are # always fetched from host and any changes are immediately pushed to host. # # - auto # Metadata and pathname lookup cache expires after a configured amount of # time (default is 1 second). Data is cached while the file is open (close # to open consistency). # # - always # Metadata, data, and pathname lookup are cached in guest and never expire. virtio_fs_cache = "always" # Block storage driver to be used for the hypervisor in case the container # rootfs is backed by a block device. This is virtio-scsi, virtio-blk # or nvdimm. block_device_driver = "virtio-scsi" # Specifies cache-related options will be set to block devices or not. # Default false #block_device_cache_set = true # Specifies cache-related options for block devices. # Denotes whether use of O_DIRECT (bypass the host page cache) is enabled. # Default false #block_device_cache_direct = true # Specifies cache-related options for block devices. # Denotes whether flush requests for the device are ignored. # Default false #block_device_cache_noflush = true # Enable iothreads (data-plane) to be used. This causes IO to be # handled in a separate IO thread. This is currently only implemented # for SCSI. # enable_iothreads = false # Enable pre allocation of VM RAM, default false # Enabling this will result in lower container density # as all of the memory will be allocated and locked # This is useful when you want to reserve all the memory # upfront or in the cases where you want memory latencies # to be very predictable # Default false #enable_mem_prealloc = true # Enable huge pages for VM RAM, default false # Enabling this will result in the VM memory # being allocated using huge pages. # This is useful when you want to use vhost-user network # stacks within the container. This will automatically # result in memory pre allocation #enable_hugepages = true # Enable file based guest memory support. The default is an empty string which # will disable this feature. In the case of virtio-fs, this is enabled # automatically and '/dev/shm' is used as the backing folder. # This option will be ignored if VM templating is enabled. #file_mem_backend = "" # Enable swap of vm memory. Default false. # The behaviour is undefined if mem_prealloc is also set to true #enable_swap = true # This option changes the default hypervisor and kernel parameters # to enable debug output where available. This extra output is added # to the proxy logs, but only when proxy debug is also enabled. # # Default false #enable_debug = true # Disable the customizations done in the runtime when it detects # that it is running on top a VMM. This will result in the runtime # behaving as it would when running on bare metal. # #disable_nesting_checks = true # This is the msize used for 9p shares. It is the number of bytes # used for 9p packet payload. #msize_9p = 8192 # If true and vsocks are supported, use vsocks to communicate directly # with the agent and no proxy is started, otherwise use unix # sockets and start a proxy to communicate with the agent. # Default false #use_vsock = true # VFIO devices are hotplugged on a bridge by default. # Enable hotplugging on root bus. This may be required for devices with # a large PCI bar, as this is a current limitation with hotplugging on # a bridge. This value is valid for "pc" machine type. # Default false #hotplug_vfio_on_root_bus = true # If host doesn't support vhost_net, set to true. Thus we won't create vhost fds for nics. # Default false #disable_vhost_net = true # # Default entropy source. # The path to a host source of entropy (including a real hardware RNG) # /dev/urandom and /dev/random are two main options. # Be aware that /dev/random is a blocking source of entropy. If the host # runs out of entropy, the VMs boot time will increase leading to get startup # timeouts. # The source of entropy /dev/urandom is non-blocking and provides a # generally acceptable source of entropy. It should work well for pretty much # all practical purposes. #entropy_source= "/dev/urandom" # Path to OCI hook binaries in the *guest rootfs*. # This does not affect host-side hooks which must instead be added to # the OCI spec passed to the runtime. # # You can create a rootfs with hooks by customizing the osbuilder scripts: # https://github.com/kata-containers/osbuilder # # Hooks must be stored in a subdirectory of guest_hook_path according to their # hook type, i.e. "guest_hook_path/{prestart,postart,poststop}". # The agent will scan these directories for executable files and add them, in # lexicographical order, to the lifecycle of the guest container. # Hooks are executed in the runtime namespace of the guest. See the official documentation: # https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks # Warnings will be logged if any error is encountered will scanning for hooks, # but it will not abort container execution. #guest_hook_path = "/usr/share/oci/hooks" [factory] # VM templating support. Once enabled, new VMs are created from template # using vm cloning. They will share the same initial kernel, initramfs and # agent memory by mapping it readonly. It helps speeding up new container # creation and saves a lot of memory if there are many kata containers running # on the same host. # # When disabled, new VMs are created from scratch. # # Note: Requires "initrd=" to be set ("image=" is not supported). # # Default false #enable_template = true # Specifies the path of template. # # Default "/run/vc/vm/template" #template_path = "/run/vc/vm/template" # The number of caches of VMCache: # unspecified or == 0 --> VMCache is disabled # > 0 --> will be set to the specified number # # VMCache is a function that creates VMs as caches before using it. # It helps speed up new container creation. # The function consists of a server and some clients communicating # through Unix socket. The protocol is gRPC in protocols/cache/cache.proto. # The VMCache server will create some VMs and cache them by factory cache. # It will convert the VM to gRPC format and transport it when gets # requestion from clients. # Factory grpccache is the VMCache client. It will request gRPC format # VM and convert it back to a VM. If VMCache function is enabled, # kata-runtime will request VM from factory grpccache when it creates # a new sandbox. # # Default 0 #vm_cache_number = 0 # Specify the address of the Unix socket that is used by VMCache. # # Default /var/run/kata-containers/cache.sock #vm_cache_endpoint = "/var/run/kata-containers/cache.sock" [proxy.kata] path = "/usr/libexec/kata-containers/kata-proxy" # If enabled, proxy messages will be sent to the system log # (default: disabled) #enable_debug = true [shim.kata] path = "/usr/libexec/kata-containers/kata-shim" # If enabled, shim messages will be sent to the system log # (default: disabled) #enable_debug = true # If enabled, the shim will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # # Note: By default, the shim runs in a separate network namespace. Therefore, # to allow it to send trace details to the Jaeger agent running on the host, # it is necessary to set 'disable_new_netns=true' so that it runs in the host # network namespace. # # (default: disabled) #enable_tracing = true [agent.kata] # If enabled, make the agent display debug-level messages. # (default: disabled) #enable_debug = true # Enable agent tracing. # # If enabled, the default trace mode is "dynamic" and the # default trace type is "isolated". The trace mode and type are set # explicity with the `trace_type=` and `trace_mode=` options. # # Notes: # # - Tracing is ONLY enabled when `enable_tracing` is set: explicitly # setting `trace_mode=` and/or `trace_type=` without setting `enable_tracing` # will NOT activate agent tracing. # # - See https://github.com/kata-containers/agent/blob/master/TRACING.md for # full details. # # (default: disabled) #enable_tracing = true # #trace_mode = "dynamic" #trace_type = "isolated" [netmon] # If enabled, the network monitoring process gets started when the # sandbox is created. This allows for the detection of some additional # network being added to the existing network namespace, after the # sandbox has been created. # (default: disabled) #enable_netmon = true # Specify the path to the netmon binary. path = "/usr/libexec/kata-containers/kata-netmon" # If enabled, netmon messages will be sent to the system log # (default: disabled) #enable_debug = true [runtime] # If enabled, the runtime will log additional debug messages to the # system log # (default: disabled) #enable_debug = true # # Internetworking model # Determines how the VM should be connected to the # the container network interface # Options: # # - bridged # Uses a linux bridge to interconnect the container interface to # the VM. Works for most cases except macvlan and ipvlan. # # - macvtap # Used when the Container network interface can be bridged using # macvtap. # # - none # Used when customize network. Only creates a tap device. No veth pair. # # - tcfilter # Uses tc filter rules to redirect traffic from the network interface # provided by plugin to a tap interface connected to the VM. # internetworking_model="tcfilter" # internetworking_model="none" # disable guest seccomp # Determines whether container seccomp profiles are passed to the virtual # machine and applied by the kata agent. If set to true, seccomp is not applied # within the guest # (default: true) disable_guest_seccomp=true # If enabled, the runtime will create opentracing.io traces and spans. # (See https://www.jaegertracing.io/docs/getting-started). # (default: disabled) #enable_tracing = true # If enabled, the runtime will not create a network namespace for shim and hypervisor processes. # This option may have some potential impacts to your host. It should only be used when you know what you're doing. # `disable_new_netns` conflicts with `enable_netmon` # `disable_new_netns` conflicts with `internetworking_model=bridged` and `internetworking_model=macvtap`. It works only # with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge # (like OVS) directly. # If you are using docker, `disable_new_netns` only works with `docker run --net=none` # (default: false) # disable_new_netns = true # Enabled experimental feature list, format: ["a", "b"]. # Experimental features are features not stable enough for production, # They may break compatibility, and are prepared for a big version bump. # Supported experimental features: # 1. "newstore": new persist storage driver which breaks backward compatibility, # expected to move out of experimental in 2.0.0. # (default: []) experimental=[] ``` --- # KSM throttler ## version Output of "`/usr/libexec/kata-ksm-throttler/kata-ksm-throttler --version`": ``` kata-ksm-throttler version 1.8.0-alpha2-766df92 ``` Output of "`/usr/lib/systemd/system/kata-ksm-throttler.service --version`": ``` /usr/bin/kata-collect-data.sh: line 178: /usr/lib/systemd/system/kata-ksm-throttler.service: Permission denied ``` ## systemd service # Image details ```yaml --- osbuilder: url: "https://github.com/kata-containers/osbuilder" version: "unknown" rootfs-creation-time: "2019-06-19T15:23:27.603074742+0000Z" description: "osbuilder rootfs" file-format-version: "0.0.2" architecture: "x86_64" base-distro: name: "Clear" version: "29980" packages: default: - "chrony" - "iptables-bin" - "libudev0-shim" - "systemd" - "util-linux-bin" extra: agent: url: "https://github.com/kata-containers/agent" name: "kata-agent" version: "1.8.0-alpha2-e40d7749dc4d82ddd948e6b2dfbe2520e130e91f" agent-is-init-daemon: "no" ``` --- # Initrd details No initrd --- # Logfiles ## Runtime logs Recent runtime problems found in system journal: ``` time="2019-07-09T14:58:43.947458544+03:00" level=warning msg="load sandbox devices failed" arch=amd64 command=create container=d74f79e9c8087c1af6d0238fbb17934352b85112a4aeb38c8050d74c20d694b9 error="open /run/vc/sbs/d74f79e9c8087c1af6d0238fbb17934352b85112a4aeb38c8050d74c20d694b9/devices.json: no such file or directory" name=kata-runtime pid=10208 sandbox=d74f79e9c8087c1af6d0238fbb17934352b85112a4aeb38c8050d74c20d694b9 sandboxid=d74f79e9c8087c1af6d0238fbb17934352b85112a4aeb38c8050d74c20d694b9 source=virtcontainers subsystem=sandbox time="2019-07-09T15:08:43.911916986+03:00" level=warning msg="load sandbox devices failed" arch=amd64 command=create container=6f6cd2387fa3ce3343c7324144e43ba13388340b33c4c0e50476f3876c6427c1 error="open /run/vc/sbs/6f6cd2387fa3ce3343c7324144e43ba13388340b33c4c0e50476f3876c6427c1/devices.json: no such file or directory" name=kata-runtime pid=15401 sandbox=6f6cd2387fa3ce3343c7324144e43ba13388340b33c4c0e50476f3876c6427c1 sandboxid=6f6cd2387fa3ce3343c7324144e43ba13388340b33c4c0e50476f3876c6427c1 source=virtcontainers subsystem=sandbox time="2019-07-09T15:19:30.53167613+03:00" level=warning msg="no such file or directory: /run/kata-containers/shared/sandboxes/6f6cd2387fa3ce3343c7324144e43ba13388340b33c4c0e50476f3876c6427c1/6f6cd2387fa3ce3343c7324144e43ba13388340b33c4c0e50476f3876c6427c1/rootfs" time="2019-07-09T15:19:57.313827339+03:00" level=warning msg="no such file or directory: /run/kata-containers/shared/sandboxes/d74f79e9c8087c1af6d0238fbb17934352b85112a4aeb38c8050d74c20d694b9/d74f79e9c8087c1af6d0238fbb17934352b85112a4aeb38c8050d74c20d694b9/rootfs" time="2019-07-09T15:23:11.980532275+03:00" level=warning msg="load sandbox devices failed" arch=amd64 command=create container=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 error="open /run/vc/sbs/396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8/devices.json: no such file or directory" name=kata-runtime pid=22870 sandbox=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 sandboxid=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 source=virtcontainers subsystem=sandbox time="2019-07-09T16:21:38.235462788+03:00" level=warning msg="load sandbox devices failed" arch=amd64 command=list error="open /run/vc/sbs/a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6/devices.json: no such file or directory" name=kata-runtime pid=15347 sandbox=a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6 sandboxid=a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6 source=virtcontainers subsystem=sandbox time="2019-07-09T16:23:12.459170118+03:00" level=error msg="add interface failed" arch=amd64 command=kata-network container=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 error="Unsupported network interface" name=kata-runtime pid=16098 resulting-interface="" sandbox=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 source=runtime time="2019-07-09T16:23:12.459299877+03:00" level=error msg="Unsupported network interface" arch=amd64 command=kata-network container=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 name=kata-runtime pid=16098 sandbox=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 source=runtime time="2019-07-09T16:25:29.648412606+03:00" level=error msg="add interface failed" arch=amd64 command=kata-network container=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 error="PhysicalEndpoint does not support Hot attach" name=kata-runtime pid=17210 resulting-interface="" sandbox=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 source=runtime time="2019-07-09T16:25:29.648578264+03:00" level=error msg="PhysicalEndpoint does not support Hot attach" arch=amd64 command=kata-network container=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 name=kata-runtime pid=17210 sandbox=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 source=runtime time="2019-07-09T16:33:31.420805143+03:00" level=warning msg="no such file or directory: /run/kata-containers/shared/sandboxes/396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8/396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8/rootfs" time="2019-07-09T16:33:58.631840757+03:00" level=warning msg="load sandbox devices failed" arch=amd64 command=list error="open /run/vc/sbs/a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6/devices.json: no such file or directory" name=kata-runtime pid=21433 sandbox=a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6 sandboxid=a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6 source=virtcontainers subsystem=sandbox time="2019-07-09T16:34:37.04558605+03:00" level=warning msg="load sandbox devices failed" arch=amd64 command=create container=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 error="open /run/vc/sbs/b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169/devices.json: no such file or directory" name=kata-runtime pid=21837 sandbox=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 sandboxid=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 source=virtcontainers subsystem=sandbox time="2019-07-09T16:34:51.654470695+03:00" level=warning msg="load sandbox devices failed" arch=amd64 command=list error="open /run/vc/sbs/a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6/devices.json: no such file or directory" name=kata-runtime pid=22081 sandbox=a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6 sandboxid=a357627c8660ce18d3b9dc6ac758812c1f00e8ab871a41d0618d3e6cc8930bd6 source=virtcontainers subsystem=sandbox time="2019-07-09T16:35:24.139540699+03:00" level=error msg="add interface failed" arch=amd64 command=kata-network container=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 error="PhysicalEndpoint does not support Hot attach" name=kata-runtime pid=22350 resulting-interface="" sandbox=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 source=runtime time="2019-07-09T16:35:24.139707168+03:00" level=error msg="PhysicalEndpoint does not support Hot attach" arch=amd64 command=kata-network container=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 name=kata-runtime pid=22350 sandbox=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 source=runtime time="2019-07-09T16:55:40.997490296+03:00" level=warning msg="no such file or directory: /run/kata-containers/shared/sandboxes/b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169/b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169/rootfs" time="2019-07-09T16:55:43.101618619+03:00" level=warning msg="load sandbox devices failed" arch=amd64 command=create container=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 error="open /run/vc/sbs/b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169/devices.json: no such file or directory" name=kata-runtime pid=32279 sandbox=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 sandboxid=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 source=virtcontainers subsystem=sandbox ``` ## Proxy logs Recent proxy problems found in system journal: ``` time="2019-07-09T15:19:30.684652612+03:00" level=fatal msg="failed to handle exit signal" error="close unix @->/run/vc/vm/6f6cd2387fa3ce3343c7324144e43ba13388340b33c4c0e50476f3876c6427c1/kata.sock: use of closed network connection" name=kata-proxy pid=15447 sandbox=6f6cd2387fa3ce3343c7324144e43ba13388340b33c4c0e50476f3876c6427c1 source=proxy time="2019-07-09T15:19:57.333099656+03:00" level=fatal msg="failed to handle exit signal" error="close unix @->/run/vc/vm/d74f79e9c8087c1af6d0238fbb17934352b85112a4aeb38c8050d74c20d694b9/kata.sock: use of closed network connection" name=kata-proxy pid=10257 sandbox=d74f79e9c8087c1af6d0238fbb17934352b85112a4aeb38c8050d74c20d694b9 source=proxy time="2019-07-09T16:33:31.437953349+03:00" level=fatal msg="failed to handle exit signal" error="close unix @->/run/vc/vm/396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8/kata.sock: use of closed network connection" name=kata-proxy pid=22921 sandbox=396e34ec8aeac66beea7b91ea876a3e2f3f969348fc924f422fa108e4c144df8 source=proxy time="2019-07-09T16:55:41.137242521+03:00" level=fatal msg="channel error" error="accept unix /run/vc/sbs/b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169/proxy.sock: use of closed network connection" name=kata-proxy pid=21901 sandbox=b255eef61804cb1c85ac40bfb17e96f9860d8550b7c9bf3db13bb5ad743d3169 source=proxy ``` ## Shim logs No recent shim problems found in system journal. ## Throttler logs No recent throttler problems found in system journal. --- # Container manager details Have `docker` ## Docker Output of "`docker version`": ``` Client: Version: 18.06.1-ce API version: 1.38 Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:23:03 2018 OS/Arch: linux/amd64 Experimental: false Server: Engine: Version: 18.06.1-ce API version: 1.38 (minimum version 1.12) Go version: go1.10.3 Git commit: e68fc7a Built: Tue Aug 21 17:25:29 2018 OS/Arch: linux/amd64 Experimental: false ``` Output of "`docker info`": ``` Containers: 25 Running: 10 Paused: 0 Stopped: 15 Images: 14 Server Version: 18.06.1-ce Storage Driver: devicemapper Pool Name: docker-253:0-67749018-pool Pool Blocksize: 65.54kB Base Device Size: 10.74GB Backing Filesystem: xfs Udev Sync Supported: true Data file: /dev/loop0 Metadata file: /dev/loop1 Data loop file: /var/lib/docker/devicemapper/devicemapper/data Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Data Space Used: 3.131GB Data Space Total: 107.4GB Data Space Available: 46.54GB Metadata Space Used: 4.985MB Metadata Space Total: 2.147GB Metadata Space Available: 2.142GB Thin Pool Minimum Free Space: 10.74GB Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.149-RHEL7 (2018-07-20) Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: kata-runtime runc Default Runtime: runc Init Binary: docker-init containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e runc version: 69663f0bd4b60df09991c08812a60108003fa340 init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-327.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 15.5GiB Name: (removed) ID: (removed) Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): true File Descriptors: 82 Goroutines: 89 System Time: 2019-07-10T18:29:08.958328036+03:00 EventsListeners: 0 Username: (removed) Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device. ``` Output of "`systemctl show docker`": ``` Type=notify Restart=on-failure NotifyAccess=main RestartUSec=100ms TimeoutStartUSec=0 TimeoutStopUSec=1min 30s WatchdogUSec=0 WatchdogTimestamp=Tue 2019-07-09 14:27:18 IDT WatchdogTimestampMonotonic=34626963 StartLimitInterval=60000000 StartLimitBurst=3 StartLimitAction=none FailureAction=none PermissionsStartOnly=no RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=1212 ControlPID=0 FileDescriptorStoreMax=0 StatusErrno=0 Result=success ExecMainStartTimestamp=Tue 2019-07-09 14:27:16 IDT ExecMainStartTimestampMonotonic=32152128 ExecMainExitTimestampMonotonic=0 ExecMainPID=1212 ExecMainCode=0 ExecMainStatus=0 ExecStart={ path=/usr/bin/dockerd ; argv[]=/usr/bin/dockerd -D --add-runtime kata-runtime=/usr/bin/kata-runtime --default-runtime=runc ; ignore_errors=no ; start_time=[Tue 2019-07-09 14:27:16 IDT] ; stop_time=[n/a] ; pid=1212 ; code=(null) ; status=0/0 } ExecReload={ path=/bin/kill ; argv[]=/bin/kill -s HUP $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } Slice=system.slice ControlGroup=/system.slice/docker.service MemoryCurrent=479932416 Delegate=yes CPUAccounting=no CPUShares=18446744073709551615 StartupCPUShares=18446744073709551615 CPUQuotaPerSecUSec=infinity BlockIOAccounting=no BlockIOWeight=18446744073709551615 StartupBlockIOWeight=18446744073709551615 MemoryAccounting=no MemoryLimit=18446744073709551615 DevicePolicy=auto UMask=0022 LimitCPU=18446744073709551615 LimitFSIZE=18446744073709551615 LimitDATA=18446744073709551615 LimitSTACK=18446744073709551615 LimitCORE=18446744073709551615 LimitRSS=18446744073709551615 LimitNOFILE=18446744073709551615 LimitAS=18446744073709551615 LimitNPROC=18446744073709551615 LimitMEMLOCK=65536 LimitLOCKS=18446744073709551615 LimitSIGPENDING=62494 LimitMSGQUEUE=819200 LimitNICE=0 LimitRTPRIO=0 LimitRTTIME=18446744073709551615 OOMScoreAdjust=0 Nice=0 IOScheduling=0 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SecureBits=0 CapabilityBoundingSet=18446744073709551615 MountFlags=0 PrivateTmp=no PrivateNetwork=no PrivateDevices=no ProtectHome=no ProtectSystem=no SameProcessGroup=no IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=0 RuntimeDirectoryMode=0755 KillMode=process KillSignal=15 SendSIGKILL=yes SendSIGHUP=no Id=docker.service Names=docker.service Requires=basic.target Wants=network-online.target system.slice WantedBy=multi-user.target Conflicts=shutdown.target Before=multi-user.target shutdown.target After=basic.target network-online.target system.slice systemd-journald.socket firewalld.service Documentation=https://docs.docker.com Description=Docker Application Container Engine LoadState=loaded ActiveState=active SubState=running FragmentPath=/usr/lib/systemd/system/docker.service DropInPaths=/etc/systemd/system/docker.service.d/kata-containers.conf UnitFileState=enabled UnitFilePreset=disabled InactiveExitTimestamp=Tue 2019-07-09 14:27:16 IDT InactiveExitTimestampMonotonic=32152210 ActiveEnterTimestamp=Tue 2019-07-09 14:27:18 IDT ActiveEnterTimestampMonotonic=34627033 ActiveExitTimestampMonotonic=0 InactiveEnterTimestampMonotonic=0 CanStart=yes CanStop=yes CanReload=yes CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureJobMode=replace IgnoreOnIsolate=no IgnoreOnSnapshot=no NeedDaemonReload=no JobTimeoutUSec=0 JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Tue 2019-07-09 14:27:16 IDT ConditionTimestampMonotonic=32150785 AssertTimestamp=Tue 2019-07-09 14:27:16 IDT AssertTimestampMonotonic=32150786 Transient=no ``` Have `kubectl` ## Kubernetes Output of "`kubectl version`": ``` Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"} The connection to the server localhost:8080 was refused - did you specify the right host or port? ``` Output of "`kubectl config view`": ``` apiVersion: v1 clusters: [] contexts: [] current-context: "" kind: Config preferences: {} users: [] ``` Output of "`systemctl show kubelet`": ``` Type=simple Restart=always NotifyAccess=none RestartUSec=10s TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s WatchdogUSec=0 WatchdogTimestamp=Tue 2019-07-09 14:27:18 IDT WatchdogTimestampMonotonic=34801738 StartLimitInterval=0 StartLimitBurst=5 StartLimitAction=none FailureAction=none PermissionsStartOnly=no RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=2758 ControlPID=0 FileDescriptorStoreMax=0 StatusErrno=0 Result=success ExecMainStartTimestamp=Tue 2019-07-09 14:27:18 IDT ExecMainStartTimestampMonotonic=34801667 ExecMainExitTimestampMonotonic=0 ExecMainPID=2758 ExecMainCode=0 ExecMainStatus=0 ExecStart={ path=/usr/bin/kubelet ; argv[]=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS ; ignore_errors=no ; start_time=[Tue 2019-07-09 14:27:18 IDT] ; stop_time=[n/a] ; pid=2758 ; code=(null) ; status=0/0 } Slice=system.slice ControlGroup=/system.slice/kubelet.service MemoryCurrent=62152704 Delegate=no CPUAccounting=no CPUShares=18446744073709551615 StartupCPUShares=18446744073709551615 CPUQuotaPerSecUSec=infinity BlockIOAccounting=no BlockIOWeight=18446744073709551615 StartupBlockIOWeight=18446744073709551615 MemoryAccounting=no MemoryLimit=18446744073709551615 DevicePolicy=auto Environment=KUBELET_EXTRA_ARGS=--container-runtime=remote\x20--runtime-request-timeout=15m\x20--container-runtime-endpoint=unix:///run/containerd/containerd.sock KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf\x20--kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml EnvironmentFile=/var/lib/kubelet/kubeadm-flags.env (ignore_errors=yes) EnvironmentFile=/etc/sysconfig/kubelet (ignore_errors=yes) UMask=0022 LimitCPU=18446744073709551615 LimitFSIZE=18446744073709551615 LimitDATA=18446744073709551615 LimitSTACK=18446744073709551615 LimitCORE=18446744073709551615 LimitRSS=18446744073709551615 LimitNOFILE=4096 LimitAS=18446744073709551615 LimitNPROC=62494 LimitMEMLOCK=65536 LimitLOCKS=18446744073709551615 LimitSIGPENDING=62494 LimitMSGQUEUE=819200 LimitNICE=0 LimitRTPRIO=0 LimitRTTIME=18446744073709551615 OOMScoreAdjust=0 Nice=0 IOScheduling=0 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SecureBits=0 CapabilityBoundingSet=18446744073709551615 MountFlags=0 PrivateTmp=no PrivateNetwork=no PrivateDevices=no ProtectHome=no ProtectSystem=no SameProcessGroup=no IgnoreSIGPIPE=yes NoNewPrivileges=no SystemCallErrorNumber=0 RuntimeDirectoryMode=0755 KillMode=control-group KillSignal=15 SendSIGKILL=yes SendSIGHUP=no Id=kubelet.service Names=kubelet.service Requires=basic.target Wants=system.slice WantedBy=multi-user.target Conflicts=shutdown.target Before=multi-user.target shutdown.target After=systemd-journald.socket system.slice basic.target Documentation=https://kubernetes.io/docs/ Description=kubelet: The Kubernetes Node Agent LoadState=loaded ActiveState=active SubState=running FragmentPath=/etc/systemd/system/kubelet.service DropInPaths=/etc/systemd/system/kubelet.service.d/0-containerd.conf /etc/systemd/system/kubelet.service.d/10-kubeadm.conf UnitFileState=enabled UnitFilePreset=disabled InactiveExitTimestamp=Tue 2019-07-09 14:27:18 IDT InactiveExitTimestampMonotonic=34801777 ActiveEnterTimestamp=Tue 2019-07-09 14:27:18 IDT ActiveEnterTimestampMonotonic=34801777 ActiveExitTimestamp=Tue 2019-07-09 14:27:08 IDT ActiveExitTimestampMonotonic=24768880 InactiveEnterTimestamp=Tue 2019-07-09 14:27:18 IDT InactiveEnterTimestampMonotonic=34789014 CanStart=yes CanStop=yes CanReload=no CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureJobMode=replace IgnoreOnIsolate=no IgnoreOnSnapshot=no NeedDaemonReload=no JobTimeoutUSec=0 JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Tue 2019-07-09 14:27:18 IDT ConditionTimestampMonotonic=34800304 AssertTimestamp=Tue 2019-07-09 14:27:18 IDT AssertTimestampMonotonic=34800305 Transient=no ``` No `crio` No `containerd` --- # Packages No `dpkg` Have `rpm` Output of "`rpm -qa|egrep "(cc-oci-runtimecc-runtimerunv|kata-proxy|kata-runtime|kata-shim|kata-ksm-throttler|kata-containers-image|linux-container|qemu-)"`": ``` qemu-lite-2.11.0+git.87517afd72-35.1.x86_64 kata-linux-container-4.19.28.42-48.1.x86_64 kata-shim-1.8.0~alpha2-30.1.x86_64 qemu-lite-bin-2.11.0+git.87517afd72-35.1.x86_64 qemu-vanilla-bin-4.0.0+git.131b9a0570-35.1.x86_64 kata-containers-image-1.8.0~alpha2-31.1.x86_64 kata-proxy-bin-1.8.0~alpha2-32.1.x86_64 kata-shim-bin-1.8.0~alpha2-30.1.x86_64 kata-ksm-throttler-1.8.0~alpha2-36.1.x86_64 qemu-img-1.5.3-160.el7_6.2.x86_64 libvirt-daemon-driver-qemu-4.5.0-10.el7_6.12.x86_64 qemu-vanilla-data-4.0.0+git.131b9a0570-35.1.x86_64 qemu-lite-data-2.11.0+git.87517afd72-35.1.x86_64 qemu-vanilla-4.0.0+git.131b9a0570-35.1.x86_64 kata-proxy-1.8.0~alpha2-32.1.x86_64 kata-runtime-1.8.0~alpha2-52.1.x86_64 ``` ---

grahamwhaley commented 5 years ago

/cc @amshinde @mcastelino

devimc commented 5 years ago

@yuvalk8s https://github.com/kata-containers/documentation/blob/master/use-cases/using-SRIOV-and-kata.md

grahamwhaley commented 5 years ago

@devimc - @yuvalk8s explicitly said not using SR-IOV - but assigning a whole NIC. I don't think the Kata SR-IOV doc covers that...

devimc commented 5 years ago

@grahamwhaley yes, but I think this guide can be useful to use VFIO, next steps work for me:

[example]
# NIC as example
 $ sudo lspci

[steps in host]
# load module
  $ sudo modprobe -i vfio-pci
#unbind the device
   $ echo 0000:00:1f.6 | sudo tee /sys/bus/pci/devices/0000:00:1f.6/driver/unbind
# find vendor & device ID
   $ lspci -n -s 00:1f.6
      00:1f.6 0200: 8086:15b7 (rev 31)
# bind to vfio-pci
   $ echo 8086 15b7 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
# check
   $ ls /dev/vfio
   8 vfio
# run
  $ docker run --runtime=kata-runtime --device /dev/vfio/8 --v /dev:/dev debian bash
devimc commented 5 years ago

@grahamwhaley @yuvalk8s https://github.com/kata-containers/documentation/blob/master/use-cases/GPU-passthrough-and-Kata.md - this is also a good document

devimc commented 5 years ago

@yuvalk8s we don't have a guide or how-to document for NICs, but I think these two documents can be useful

grahamwhaley commented 5 years ago

thanks @devimc . @yuvalk8s - if we can work with you and work out the set of instructions, maybe you can send a pull request adding a document to this repo? :-)

We can of course help you with that.. And, let us know if the above docs and @devimc instructions give you enough information to get up and running? thx!

Yuval-Ai commented 5 years ago

Thanks @grahamwhaley & @devimc. Yes, I think @devimc's instructions might be a good starting point for me. I'm going to try them out and would be glad to work together to get it done. Once I'll have everything running and tested, I'll be more than happy to add a document on the process..

Thanks!

mcastelino commented 5 years ago

VFIO will work even for a regular NIC. As long as we discover the interface to be a physical interface we use VFIO. Let us know if this fails.

/cc @amshinde

Yuval-Ai commented 5 years ago

Hi all, I followed @devimc's instructions with some changes along the way, and currently can't see the NIC in the loaded container.

It seems like the problem is that the container loads without the VFIO module.

After reading the GPU-passthrough-and-Kata document, I assume:

Am I correct?

I added here the steps that I took and their results (bad results heighligted).

Thanks!! @grahamwhaley @devimc @mcastelino @mcastelino

1. Enable the hotplug_vfio_on_root_bus configuration in the

Kata configuration.toml file:

[host]# vi /usr/share/defaults/kata-containers/configuration.toml

a.  Set hotplug_vfio_on_root_bus = true
b.  Make sure you are using the pc machine type by verifying machine_type = "pc"

2. Install lshw and find pci address for the interface

[host]# yum install lshw

[host]# lshw -class network -businfo | grep enp0s20f1

Result: pci@0000:00:14.1 enp0s20f1 network Ethernet Connection I354 1.0 GbE Backplane

3. Insert into a variable

[host]# BDF="0000:00:14.1"

4. Load module

[host]# modprobe -i vfio-pci

5. Unbind the device

[host]# echo $BDF | sudo tee /sys/bus/pci/devices/$BDF/driver/unbind

Result: 0000:00:14.1

6. Find vendor & device ID

[host]# lspci -n -s $BDF

Result: 00:14.1 0200: 8086:1f40 (rev 03)

7. Bind to vfio-pci

[host]# echo 8086 1f40 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id

Result: 8086 1f40

8. Check dev name

[host]# ls /dev/vfio

Result: vfio

9. Run container

[host]# docker run -it --runtime=kata-runtime --name vfio_con --device /dev/vfio/vfio -v /dev:/dev ubuntu bash

10. In the container, update sources and install needed packages

[container]# apt update ; apt install iproute2 pciutils kmod -y

11. Find forwarded device

[container]# lspci -nn –D

Result: No [8086:1f40] in results.

12. Check for VFIO module

[container]# lsmod

Result: Module Size Used by (------empty-------)

13. Tested the same for --privileged container with the same results.

devimc commented 5 years ago

Hi @yuvalk8s , I forgot to mention that you need to enable intel_iommu in the host, append intel_iommu=on in the kernel command line

Yuval-Ai commented 5 years ago

Hi @devimc, I enabled the IOMMU, but still get the same results as the ones in my last comment. Any ideas? Thanks

[host]# lsmod | grep vfio

vfio_iommu_type1 17632 0 vfio_pci 36735 0 vfio 25291 2 vfio_iommu_type1,vfio_pci

devimc commented 5 years ago

@yuvalk8s I see a problem in the steps 8 and 9, in/dev/vfio/ directory you should see a device with a number (i.e /dev/vfio/1) and this device should be used in the docker command line docker .. --device /dev/vfio/1 ...

devimc commented 5 years ago

@yuvalk8s do you know if your hw supports iommu? https://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware

Yuval-Ai commented 5 years ago

Hi @devimc, I checked and no, my setup doesn't support VT-d. Is there a way to do this without IOMMU? In the meantime I'm also working on switching to HW that does have support for IOMMU. Thanks!

devimc commented 5 years ago

@yuvalk8s afaik IOMMU support is required for vfio passthrough

cc @grahamwhaley

grahamwhaley commented 5 years ago

Ooh, I don't know. I believe the IOMMU gives you efficiency and memory protection, but there does seem to be a no-iommu UIO VFIO mode - https://lwn.net/Articles/660745/- but, I'm not aware of how or if anybody has used that with Kata - maybe @mcastelino @amshinde @stefanha might know..

Whilst here, I also found a NEMU guide that might repeat what we've already detailed above: https://github.com/intel/nemu/wiki/Testing-VFIO-with-GPU

Yuval-Ai commented 5 years ago

Hi @devimc @grahamwhaley , Sorry for the delayed response. I’m happy to inform that I was able to switch to an IOMMU supporting HW, and succeeded in passing the physical port to the container. Now, while inside the container, I can see the device:

[container]# lshw -class network -businfo -numeric
Bus info          Device  Class          Description
====================================================
pci@0000:00:07.0          network        Virtio network device [1AF4:1000]
virtio@4          eth0    network        Ethernet interface
pci@0000:00:08.0          network        I211 Gigabit Network Connection [8086:1539]

# (I211 Gigabit is the passed device)

But since the container (I used Ubuntu) loads without the necessary drivers, it can't bind the device. I would like to avoid manually installing drivers inside new containers, and instead pass the driver to the container while generating it (like the --device mapping).

Do you maybe know how this could be done? (Or any other decent alternative for that matter..)

Thanks!!

*Even though I’m not using DPDK, just to give a little bit more information:

# dpdk-devbind --status
Network devices using DPDK-compatible driver
============================================
<none>

Network devices using kernel driver
===================================
0000:00:07.0 'Virtio network device 1000' if=eth0 drv=virtio-pci unused=vfio-pci,uio_pci_generic *Active*

Other Network devices
=====================
0000:00:08.0 'I211 Gigabit Network Connection 1539' unused=vfio-pci,uio_pci_generic
grahamwhaley commented 5 years ago

Hi @yuvalk8s - (heh, wrong @ g r a h m btw ;-)). @devimc posed a PR to the agent a couple of days ago to try and make kernel module loading easier: https://github.com/kata-containers/agent/issues/615 But, the current methods are I believe to either add the static (non-module) driver to the kernel config and build a custom kernel, or to use an OCI hook to load the module (like nVIDIA did with kata, but it seems have never gotten around to documenting it well :-( ). See https://github.com/kata-containers/documentation/issues/517#issuecomment-513154346 - it looks like you are not the only person trying to do this with Kata right now?

Yuval-Ai commented 5 years ago

@grahamwhaley , fixed the @ g r a h m thing, sorry about that..

Will try the methods you mentioned, but I was hoping to avoid custom kernels, I’ll see how it goes. Thanks!

slime-horse commented 5 years ago

Best of luck to y’all!

Yuval-Ai commented 5 years ago

Hi @grahamwhaley, @devimc.

I started working on the “how to” guide for using NIC pass-through on my fork. Hope to submit a PR for it soon.

I’m still lacking that final kernel module loading part, but AFAIU kata-containers/agent#615 was completed and can now be used (right?).

@devimc, can you please direct me to a description of using the new method / walk me through it?

Thanks a lot!

thanks @devimc . @yuvalk8s - if we can work with you and work out the set of instructions, maybe you can send a pull request adding a document to this repo? :-)

We can of course help you with that.. And, let us know if the above docs and @devimc instructions give you enough information to get up and running? thx!

grahamwhaley commented 5 years ago

Hi @Yuval-Ai - yes, looks like https://github.com/kata-containers/agent/pull/616 did get merged. Hopefully @devimc can help you with the new method - I think he is back online tomorrow.

devimc commented 5 years ago

@Yuval-Ai currently I'm working on the documentation and fixing an issue with the annotations, so you have to:

  1. build and install the latest of kata (we haven't released 1.9.0), you can clone the tests repo and use this script https://github.com/kata-containers/tests/blob/master/.ci/install_kata.sh
  2. build a custom image that includes your modules (I'm working on a method to facilitate this). Make sure to use the same kernel version for the guest kernel and modules.
  3. Set the list of modules in the configuration file (/usr/share/default/kata-containers/configuration.toml or /etc/kata-containers/configurations.toml) https://github.com/kata-containers/runtime/blob/master/cli/config/configuration-qemu.toml.in#L334
Yuval-Ai commented 4 years ago

Hi @devimc. The Guide I wrote is almost complete, but I still don't know how to solve the last part of binding the passed device (inside the container) with some sort of a matching passed driver. Could you please help me with adding that section?

I'm aware that in order to achieve that i might have to change some of the previous steps as well. If that's the case, please let me know.

Thanks!

devimc commented 4 years ago

@Yuval-Ai excellent guide/documentation, the latest part is the tricky part, the kernel that we ships not includes support for any device (gpu, net, etc) and the kernel driver to use depends on the device type. User should build and install a kernel that contains the drivers for the devices to hotplug prior to run the container (something like this) or somehow (may be we should document this too) include the modules in a image and load them later using annotations

devimc commented 4 years ago

@Yuval-Ai can we close this?

Yuval-Ai commented 4 years ago

Hi @devimc, Of course, you can close this. Unfortunately we put this project on hold for now, and i can't complete that last part any time soon. I'm more than happy to send over what I've done so far, if there is someone who can complete the doc.

Thank you for all your help, Yuval

devimc commented 4 years ago

thanks @Yuval-Ai

amshinde commented 4 years ago

@Yuval-Ai We recently added docs for passing a nvidia gpu, but I think your doc to pass a network device would be quite helpful as well. Can you open a PR to add that, someone can then pick up that PR. Thanks!