kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 376 forks source link

vfio device passthrough fails with “Operation not permitted” #2605

Closed amarshall closed 4 years ago

amarshall commented 4 years ago

Description of problem

Fedora 32 beta, Kernel 5.6.3-300. Using kata-runtime from Fedora repo (1.11.0-alpha1) and encountered #2542, so forked the upstream package spec and applied the patch for that fix. This problem was encountered on that as well as changing the package spec to pull directly from commit af24829c2ae78c3b811c4cff6736ddaba500d37c. Both encounter the problem described.

Expected result

sudo podman run -it --rm --cap-add=ALL --runtime=kata-runtime --device /dev/vfio/72 fedora Container starts and has device attached.

Actual result

sudo podman run -it --rm --cap-add=ALL --runtime=kata-runtime --device /dev/vfio/72 fedora

Error: QMP command failed: vfio 0000:03:10.4: failed to open /dev/vfio/72: Operation not permitted: OCI runtime permission denied error


This device being passed-through is an Intel 82599 Virtual Function from a X520, however I also encounter the same error with other devices bound to vfio-pci. I can successfully pass this device through to a VM using qemu-kvm via virt-manager, so IOMMU, etc., are configured correctly.

I am not using a custom kernel for the VM, as it doesn't seem like it should be necessary as this is not a large BAR device. I'd also expect if missing drivers were the only problem I'd still be able to start the container.

Starting a container using kata-runtime without attempting vfio passthrough is successful.

Various troubleshooting changes that had no effect:

devimc commented 4 years ago

@amarshall thanks for raising this, I think this is my fault since device cgroups are honoured when sandbox_cgroup_only=true, I have a patch, could you please help me to test it?

amarshall commented 4 years ago

@devimc Running 1.11.0-alpha1 with patches #2542 and #2606 gets the container started and I can see the device passed-through with lspci (though it gives me “unknown header type 7f”, but I’m fairly certain that‘s a driver issue within the VM).

Thanks!

devimc commented 4 years ago

@amarshall thanks for confirming

amorenoz commented 4 years ago

@amarshall Are you running cgroups v2 or v1?

amarshall commented 4 years ago

@amorenoz As far as I can tell, cgroups v2, which is the default in Fedora since fc31.

yadzhang commented 4 years ago

Using kata-runtime from with latest version(1.11.1), I test sriov NIC with config "sandbox_cgroup_only=true", also report error with “Operation not permitted”, and qemu vm started failed. But if I set sandbox_cgroup_only=false,there is no error, vm is started successfully.

So I think it may be the same reasons with this problem.

Error msg: level=error msg="failed to launch qemu: qemu-system-x86_64: -device vfio-pci,host=0000:3b:00.2,x-pci-vendor-id=0x15b3,x-pci-device-id=0x1018,romfile=: vfio error: 0000:3b:00.2: failed to open /dev/vfio/82: Operation not permitted\n" ID=ad42d0a64a2f853b6ae57754f680ebc0bd7c8364be0a3b3045c12a9da5f350e7 error="exit status 1" source=virtcontainers subsystem=qemu

Host kernel version: 5.4.19.bsk.1-amd64 #5.4.19.bsk.1 SMP Debian 5.4.19.bsk.1 Fri Feb 21 13:20:08 UTC 20 x86_64 GNU/Linux

devimc commented 4 years ago

@yadzhang what command did you run? docker, podman, k8s? you are facing an issue with the device cgroup in the host

yadzhang commented 4 years ago

Thanks for response. I use k8s+containerd+kata-runtime+qemu and network mode is switchdev+sriov. Kata-runtime consider the sriov interface in the sandbox netns as physical interface, so it use vfio to mount into the vm.

bpradipt commented 4 years ago

@yadzhang can you enable unsafe_interrupts and try

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
yadzhang commented 4 years ago

@yadzhang can you enable unsafe_interrupts and try

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf

The same error. Warning FailedCreatePodSandBox 2s (x4 over 46s) kubelet, Failed create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to launch qemu: exit status 1, error messages from qemu log: qemu-system-x86_64: -device vfio-pci,host=0000:3b:00.4,x-pci-vendor-id=0x15b3,x-pci-device-id=0x1018,romfile=: vfio error: 0000:3b:00.4: failed to open /dev/vfio/84: Operation not permitted Maybe need to add device "/dev/vfio/84" into the sandbox device cgroup file "devices.allow" ?

devimc commented 4 years ago

Maybe need to add device "/dev/vfio/84" into the sandbox device cgroup file "devices.allow" ?

@yadzhang kata-runtime should do it automatically, could you enable_debug in the configuration file, run again the test again and paste the logs here ?

yadzhang commented 4 years ago

I check the code about the cgroup manger. It adds /dev/vfio/vfio into cgroup but no /dev/vfio/{id}. I use containerd-shim-kata-v2 instead of kata-runtim and set all enable_debug=true in the configuration. And all logs are belows:

Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.767948982+08:00" level=info msg="loaded configuration" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 file=/data/kata/share/defaults/kata-containers/configuration-qemu.toml format=TOML source=katautils Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.768039539+08:00" level=debug ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 default-kernel-parameters="systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket" source=katautils Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.768465049+08:00" level=debug msg="container rootfs: /root/tce/containerd/run/daemon/io.containerd.runtime.v2.task/k8s.io/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/rootfs" source=virtcontainers subsystem=oci Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.770638839+08:00" level=debug msg="restore sandbox failed" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 error="open /run/vc/sbs/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/persist.json: no such file or directory" sandbox=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=sandbox Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.770685291+08:00" level=debug msg="Creating bridges" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=qemu Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.77070504+08:00" level=debug msg="Creating UUID" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=qemu Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.771238099+08:00" level=debug msg="Disable nesting environment checks" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 inside-vm=false source=virtcontainers subsystem=qemu Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.771565033+08:00" level=info msg="adding volume" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=qemu volume-type=virtio-9p Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.772286581+08:00" level=info msg="Physical network interface found" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 interface=eth0 source=virtcontainers subsystem=network Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.772815577+08:00" level=info msg="Endpoints found after scan" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 endpoints="[0xc0000f2a00]" source=virtcontainers subsystem=network Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.7728695+08:00" level=info msg="Attaching endpoint" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 endpoint-type=physical hotplug=false source=virtcontainers subsystem=network Jun 12 11:14:40 kata[1850343]: time="2020-06-12T11:14:40.772894941+08:00" level=info msg="Unbinding device from driver" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 device-bdf="0000:3b:00.3" driver-path="/sys/bus/pci/devices/0000:3b:00.3/driver/unbind" source=virtcontainers subsystem=device Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.3118116+08:00" level=info msg="Writing vendor-device-id to vfio new-id path" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=device vendor-device-id="0x15b3 0x1018" vfio-new-id-path=/sys/bus/pci/drivers/vfio-pci/new_id Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.312110237+08:00" level=info msg="Binding device to vfio driver" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 device-bdf="0000:3b:00.3" driver-path=/sys/bus/pci/drivers/vfio-pci/bind source=virtcontainers subsystem=device Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.312243935+08:00" level=debug msg="Network added" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=network Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.334822937+08:00" level=info msg="Starting VM" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 sandbox=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=sandbox Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.334915029+08:00" level=debug ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 default-kernel-parameters="tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 debug systemd.show_status=true systemd.log_level=debug" source=virtcontainers subsystem=qemu Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.335072446+08:00" level=info msg="launching /data/kata/bin/qemu-system-x86_64 with: [-name sandbox-3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 -uuid 767679a8-99d6-4e64-877c-3fa806398490 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host -qmp unix:/run/vc/vm/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/qmp.sock,server,nowait -m 2048M,slots=10,maxmem=386179M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/data/kata/share/kata-containers/kata-containers.img,size=402653184 -device virtio-scsi-pci,id=scsi0,disable-modern=false,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0,romfile= -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/vm/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/kata.sock,server,nowait -device virtio-9p-pci,disable-modern=false,fsdev=extra-9p-kataShared,mount_tag=kataShared,romfile= -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/shared,security_model=none -device vfio-pci,host=0000:3b:00.3,x-pci-vendor-id=0x15b3,x-pci-device-id=0x1018,romfile= -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -object memory-backend-ram,id=dimm1,size=2048M -numa node,memdev=dimm1 -kernel /data/kata/share/kata-containers/vmlinuz-4.19.86-60 -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 debug systemd.show_status=true systemd.log_level=debug panic=1 nr_cpus=96 agent.use_vsock=false systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket scsi_mod.scan=none rw -pidfile /run/vc/vm/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/pid -D /run/vc/vm/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/qemu.log -smp 1,cores=1,threads=1,sockets=96,maxcpus=96]" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=qmp Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.370674027+08:00" level=error msg="Unable to launch /data/kata/bin/qemu-system-x86_64: exit status 1" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=qmp Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.370727817+08:00" level=error ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=qmp Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.370769312+08:00" level=error msg="failed to launch qemu: qemu-system-x86_64: -device vfio-pci,host=0000:3b:00.3,x-pci-vendor-id=0x15b3,x-pci-device-id=0x1018,romfile=: vfio error: 0000:3b:00.3: failed to open /dev/vfio/83: Operation not permitted\n" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 error="exit status 1" source=virtcontainers subsystem=qemu Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.370873008+08:00" level=info msg="Detaching endpoint" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 endpoint-type=physical source=virtcontainers subsystem=network Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.370898209+08:00" level=info msg="Unbinding device from driver" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 device-bdf="0000:3b:00.3" driver-path="/sys/bus/pci/devices/0000:3b:00.3/driver/unbind" source=virtcontainers subsystem=device Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.371167118+08:00" level=info msg="Binding back device to host driver" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 device-bdf="0000:3b:00.3" driver-path=/sys/bus/pci/drivers/mlx5_core/bind source=virtcontainers subsystem=device Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.765714726+08:00" level=debug msg="Network removed" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=network Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.765764816+08:00" level=debug msg="Deleting sandbox cgroup" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 sandbox=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 source=virtcontainers subsystem=sandbox Jun 12 11:14:41 kata[1850343]: time="2020-06-12T11:14:41.7858325+08:00" level=info msg="cleanup agent" ID=3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074 path=/run/kata-containers/shared/sandboxes/3656ccc016ed2c4efca0795472f14e055c89be54fd8ef260e9c29d52576fc074/shared source=virtcontainers subsystem=kata_agent

devimc commented 4 years ago

@yadzhang yeah - I think this is a valid issues, would you mind filing a new issue?

devimc commented 4 years ago

@yadzhang I have a patch, once you raise the issue, I will open a PR and need to help to test it, wdyt?

yadzhang commented 4 years ago

@yadzhang I have a patch, once you raise the issue, I will open a PR and need to help to test it, wdyt?

Thank you for reply, I will raise a new issue.