Closed sankalpatimilsina12 closed 2 years ago
Please post output of dpdk-devbind.py --status
command.
Also, does 0000:44:00.3
refers to a virtual function or is it a secondary port?
There is a note here that says:
The instructions below will allow running DPDK with igb_uio or uio_pci_generic drivers as non-root with older Linux kernel versions. However, since version 4.0, the kernel does not allow unprivileged processes to read the physical address information from the pagemaps file, making it impossible for those processes to be used by non-privileged users. In such cases, using the VFIO driver is recommended.
Could it be related? OTOH, 4.0 is quite old (did ndn-dpdk ever support a kernel older than that?) so maybe non-root ndn-dpdk never worked with cards using igb_uio
...?
Minimal kernel version is currently 5.4, but this will increase to 5.10 later this year.
NDN-DPDK is always running as root, but in Docker it doesn't have all the capabilities. I don't see how network namespace affects things.
Please post output of
dpdk-devbind.py --status
command. Also, does0000:44:00.3
refers to a virtual function or is it a secondary port?
dpdk-devbind.py --status
Network devices using DPDK-compatible driver
============================================
0000:44:00.3 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' drv=igb_uio unused=i40e,vfio-pci
Network devices using kernel driver
===================================
0000:44:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f0 drv=i40e unused=igb_uio,vfio-pci *Active*
0000:44:00.1 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f1 drv=i40e unused=igb_uio,vfio-pci
0000:44:00.2 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' if=enp68s0f2 drv=i40e unused=igb_uio,vfio-pci
0000:c1:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f0 drv=mlx5_core unused=igb_uio,vfio-pci
0000:c1:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f1 drv=mlx5_core unused=igb_uio,vfio-pci
0000:44:00.3
refers to a secondary port.
I can reproduce this error in Docker, but it occurs regardless of --network host
or not.
It's most likely due to missing --device
or --mount
flags.
I've changed the instructions to use vfio-pci driver instead of igb_uio.
In my test with X710 secondary port, vfio-pci driver works well for both physical function and virtual function; it also does not need .iovaMode="PA"
setting.
Please try again with vfio-pci driver.
I believe @sankalpatimilsina12 was also using --privileged
with --network host
, maybe that's the key difference.
Yes, --privileged
allowed igb_uio to work in Docker in my test, because it mounts all devices into the container.
I believe @sankalpatimilsina12 was also using
--privileged
with--network host
, maybe that's the key difference.
Yeah, I was using --privileged
along with --network host
.
This reports not all devices in the same IOMMU group are bound to VFIO.
{"level":"error","ts":1654654667.4602127,"logger":"eal","msg":"rte_dev_probe error","addr":"0000:44:00.3","args":"","error":"1 Operation not permitted"}
{"level":"error","ts":1654654667.4602826,"logger":"DPDK","msg":"EAL: 0000:44:00.3 VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound"}
{"level":"error","ts":1654654667.4603493,"logger":"DPDK","msg":"EAL: Driver cannot attach the device (0000:44:00.3)"}
{"level":"error","ts":1654654667.460385,"logger":"DPDK","msg":"EAL: Failed to attach device on primary process"}
The IOMMU is enabled.
dpdk-devbind.py --status
Network devices using DPDK-compatible driver
============================================
0000:44:00.3 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' drv=vfio-pci unused=i40e
Network devices using kernel driver
===================================
0000:44:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f0 drv=i40e unused=vfio-pci *Active*
0000:44:00.1 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f1 drv=i40e unused=vfio-pci
0000:44:00.2 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' if=enp68s0f2 drv=i40e unused=vfio-pci
0000:c1:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f0 drv=mlx5_core unused=vfio-pci
0000:c1:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f1 drv=mlx5_core unused=vfio-pci
FYI, we tried binding the other port (0000:44:00.2
) to vfio-pci as well, but it's not enough, same error.
dpdk-devbind documentation says:
Due to the way VFIO works, there are certain limitations to which devices can be used with VFIO. Mainly it comes down to how IOMMU groups work. Any Virtual Function device can be used with VFIO on its own, but physical devices will require either all ports bound to VFIO, or some of them bound to VFIO while others not being bound to anything at all.
If your device is behind a PCI-to-PCI bridge, the bridge will then be part of the IOMMU group in which your device is in. Therefore, the bridge driver should also be unbound from the bridge PCI device for VFIO to work with devices behind the bridge.
The ArchLinux PCI passthru guide has a script in "Ensuring that the groups are valid" section that helps you determine which devices are part of the IOMMU group. You need to bind all of them to vfio-pci.
Listing devices on the same iommu group:
ls /sys/bus/pci/devices/0000:44:00.3/iommu_group/devices
0000:40:01.0 0000:40:01.1 0000:40:01.3 0000:40:01.4 0000:41:00.0 0000:42:00.0 0000:43:00.0 0000:44:00.0 0000:44:00.1 0000:44:00.2 0000:44:00.3
This one:
0000:44:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f0 drv=i40e unused=vfio-pci *Active*
is used to access the machine. I believe I can't bind that?
The other option is creating a virtual function on the device you want to use.
According to dpdk-devbind.py documentations, virtual function may use VFIO individually.
ndn-dpdk/docs/hardware.md
has instructions for Intel virtual functions.
With Intel VF, it logs the same issue:
dpdk-devbind.py -s
Network devices using DPDK-compatible driver
============================================
0000:44:0e.0 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf
Network devices using kernel driver
===================================
0000:44:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f0 drv=i40e unused=vfio-pci *Active*
0000:44:00.1 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f1 drv=i40e unused=vfio-pci
0000:44:00.2 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' if=enp68s0f2 drv=i40e unused=vfio-pci *Active*
0000:44:00.3 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' if=enp68s0f3 drv=i40e unused=vfio-pci
0000:44:0e.1 'Ethernet Virtual Function 700 Series 154c' if=enp68s0f3v1 drv=iavf unused=vfio-pci
0000:44:0e.2 'Ethernet Virtual Function 700 Series 154c' if=enp68s0f3v2 drv=iavf unused=vfio-pci
0000:44:0e.3 'Ethernet Virtual Function 700 Series 154c' if=enp68s0f3v3 drv=iavf unused=vfio-pci
0000:c1:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f0 drv=mlx5_core unused=vfio-pci
0000:c1:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f1 drv=mlx5_core unused=vfio-pci *Active*
{"level":"error","ts":1655233730.367331,"logger":"eal","msg":"rte_dev_probe error","addr":"0000:44:0e.0","args":"","error":"1 Operation not permitted"}
{"level":"error","ts":1655233730.367408,"logger":"DPDK","msg":"EAL: 0000:44:0e.0 VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound"}
{"level":"error","ts":1655233730.3674757,"logger":"DPDK","msg":"EAL: Driver cannot attach the device (0000:44:0e.0)"}
{"level":"error","ts":1655233730.3675182,"logger":"DPDK","msg":"EAL: Failed to attach device on primary process"}
ls /sys/bus/pci/devices/0000:44:0e.0/iommu_group/devices
0000:40:01.0 0000:40:01.3 0000:41:00.0 0000:43:00.0 0000:44:00.1 0000:44:00.3 0000:44:0e.1 0000:44:0e.3
0000:40:01.1 0000:40:01.4 0000:42:00.0 0000:44:00.0 0000:44:00.2 0000:44:0e.0 0000:44:0e.2
I suspect something on this system doesn't support ACS (https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/hardware_considerations_for_implementing_sr-iov/index).
Also note that this is an AMD machine.
It seems to me that too many stars need to align for VFIO to function properly and be usable in all cases. I suggest we go back and troubleshoot/fix the original problem in this report (using uio in a non-privileged container).
this is an AMD machine.
AMD needs a different kernel parameter. The ArchLinux PCI passthru guide has relevant instructions.
this system doesn't support ACS
DPDK VFIO guide says: If your device is behind a PCI-to-PCI bridge, the bridge will then be part of the IOMMU group in which your device is in. Therefore, the bridge driver should also be unbound from the bridge PCI device for VFIO to work with devices behind the bridge.
Since the port you want to use with DPDK and the port you want to use with kernel are physically different PCI cards, you may be able to move it to a different slot and place it behind a different PCI bridge. The motherboard manual should have a diagram that shows which slots are sharing a PCI bridge.
using uio in a non-privileged container
DPDK doesn't officially support running in unprivileged container.
The trick I found in the past is unreliable.
Thus, if you want UIO, you have to add back --priviledged
flag.
I re-tested Docker UIO instructions.
It seems that Docker changed during an update within last 15 months, so that the container cannot write to /sys
even if it's mounted with readonly=false
.
Inside the container the mountpoint shows "rw" but it's not writable.
Adding more capabilities, mounting more specific folders, or changing namespaces does not help at all.
The only solution at the moment is --priviledged
.
Adding more capabilities, mounting more specific folders, or changing namespaces does not help at all. The only solution at the moment is
--priviledged
.
Ok, then this is the "solution" to the original report. We will go back to using --privileged
and the igb_uio
driver. I believe we can close this issue as CANTFIX.
Btw you may want to update the issue title, "--iova-mode pa" and "docker network" are red herrings.
The
sys
is mounted already while creating the ethernet port:Running NDN-DPDK service:
The forwarder is activated with
.eal.extraFlags
set to--iova-mode pa
otherwise the current mode is automatically set tova
and the port creation fails.However, with the
host
network, this issue is not there.