usnistgov / ndn-dpdk

NDN-DPDK: High-Speed Named Data Networking Forwarder
https://www.nist.gov/publications/ndn-dpdk-ndn-forwarding-100-gbps-commodity-hardware
Other
131 stars 26 forks source link

Port creation failing with Intel X710 #63

Closed sankalpatimilsina12 closed 2 years ago

sankalpatimilsina12 commented 2 years ago
{"level":"error","ts":1654204101.0945683,"logger":"eal","msg":"rte_dev_probe error","addr":"0000:44:00.3","args":"","error":"1 Operation not permitted"}
{"level":"error","ts":1654204101.094628,"logger":"DPDK","msg":"EAL: Cannot open /sys/class/uio/uio0/device/config: Permission denied"}
{"level":"error","ts":1654204101.0947154,"logger":"DPDK","msg":"EAL: Driver cannot attach the device (0000:44:00.3)"}

The sys is mounted already while creating the ethernet port:

sudo docker run \
    --cap-add IPC_LOCK --cap-add NET_ADMIN --cap-add SYS_ADMIN --cap-add SYS_NICE \
    --mount type=bind,source=/dev/hugepages,target=/dev/hugepages \
    --mount type=volume,source=run-ndn,target=/run/ndn \
    $(sudo find /dev -name 'uio*' -type c -printf ' --device %p') \
    --mount type=bind,source=/sys,target=/sys \
    -i --rm yuanhao233/ndn-dpdk-vip:latest ndndpdk-ctrl --gqlserver $GQLSERVER create-eth-port --pci 44:00.3

Running NDN-DPDK service:

sudo docker run -d --name fw \
  --cap-add IPC_LOCK --cap-add NET_ADMIN --cap-add SYS_ADMIN --cap-add SYS_NICE \
  --mount type=bind,source=/dev/hugepages,target=/dev/hugepages \
  --mount type=volume,source=run-ndn,target=/run/ndn \
  $(sudo find /dev -name 'uio*' -type c -printf ' --device %p') \
  --mount type=bind,source=/sys,target=/sys \
  yuanhao233/ndn-dpdk-vip:latest

The forwarder is activated with .eal.extraFlags set to --iova-mode pa otherwise the current mode is automatically set to va and the port creation fails.

However, with the host network, this issue is not there.

yoursunny commented 2 years ago

Please post output of dpdk-devbind.py --status command. Also, does 0000:44:00.3 refers to a virtual function or is it a secondary port?

Pesa commented 2 years ago

There is a note here that says:

The instructions below will allow running DPDK with igb_uio or uio_pci_generic drivers as non-root with older Linux kernel versions. However, since version 4.0, the kernel does not allow unprivileged processes to read the physical address information from the pagemaps file, making it impossible for those processes to be used by non-privileged users. In such cases, using the VFIO driver is recommended.

Could it be related? OTOH, 4.0 is quite old (did ndn-dpdk ever support a kernel older than that?) so maybe non-root ndn-dpdk never worked with cards using igb_uio...?

yoursunny commented 2 years ago

Minimal kernel version is currently 5.4, but this will increase to 5.10 later this year.

NDN-DPDK is always running as root, but in Docker it doesn't have all the capabilities. I don't see how network namespace affects things.

sankalpatimilsina12 commented 2 years ago

Please post output of dpdk-devbind.py --status command. Also, does 0000:44:00.3 refers to a virtual function or is it a secondary port?

dpdk-devbind.py --status

Network devices using DPDK-compatible driver
============================================
0000:44:00.3 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' drv=igb_uio unused=i40e,vfio-pci

Network devices using kernel driver
===================================
0000:44:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f0 drv=i40e unused=igb_uio,vfio-pci *Active*
0000:44:00.1 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f1 drv=i40e unused=igb_uio,vfio-pci
0000:44:00.2 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' if=enp68s0f2 drv=i40e unused=igb_uio,vfio-pci
0000:c1:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f0 drv=mlx5_core unused=igb_uio,vfio-pci
0000:c1:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f1 drv=mlx5_core unused=igb_uio,vfio-pci

0000:44:00.3 refers to a secondary port.

yoursunny commented 2 years ago

I can reproduce this error in Docker, but it occurs regardless of --network host or not. It's most likely due to missing --device or --mount flags.

I've changed the instructions to use vfio-pci driver instead of igb_uio. In my test with X710 secondary port, vfio-pci driver works well for both physical function and virtual function; it also does not need .iovaMode="PA" setting. Please try again with vfio-pci driver.

Pesa commented 2 years ago

I believe @sankalpatimilsina12 was also using --privileged with --network host, maybe that's the key difference.

yoursunny commented 2 years ago

Yes, --privileged allowed igb_uio to work in Docker in my test, because it mounts all devices into the container.

sankalpatimilsina12 commented 2 years ago

I believe @sankalpatimilsina12 was also using --privileged with --network host, maybe that's the key difference.

Yeah, I was using --privileged along with --network host.

sankalpatimilsina12 commented 2 years ago

This reports not all devices in the same IOMMU group are bound to VFIO.

{"level":"error","ts":1654654667.4602127,"logger":"eal","msg":"rte_dev_probe error","addr":"0000:44:00.3","args":"","error":"1 Operation not permitted"}
{"level":"error","ts":1654654667.4602826,"logger":"DPDK","msg":"EAL: 0000:44:00.3 VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound"}
{"level":"error","ts":1654654667.4603493,"logger":"DPDK","msg":"EAL: Driver cannot attach the device (0000:44:00.3)"}
{"level":"error","ts":1654654667.460385,"logger":"DPDK","msg":"EAL: Failed to attach device on primary process"}

The IOMMU is enabled.

dpdk-devbind.py --status

Network devices using DPDK-compatible driver
============================================
0000:44:00.3 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' drv=vfio-pci unused=i40e

Network devices using kernel driver
===================================
0000:44:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f0 drv=i40e unused=vfio-pci *Active*
0000:44:00.1 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f1 drv=i40e unused=vfio-pci
0000:44:00.2 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' if=enp68s0f2 drv=i40e unused=vfio-pci
0000:c1:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f0 drv=mlx5_core unused=vfio-pci
0000:c1:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f1 drv=mlx5_core unused=vfio-pci
Pesa commented 2 years ago

FYI, we tried binding the other port (0000:44:00.2) to vfio-pci as well, but it's not enough, same error.

yoursunny commented 2 years ago

dpdk-devbind documentation says:

Due to the way VFIO works, there are certain limitations to which devices can be used with VFIO. Mainly it comes down to how IOMMU groups work. Any Virtual Function device can be used with VFIO on its own, but physical devices will require either all ports bound to VFIO, or some of them bound to VFIO while others not being bound to anything at all.

If your device is behind a PCI-to-PCI bridge, the bridge will then be part of the IOMMU group in which your device is in. Therefore, the bridge driver should also be unbound from the bridge PCI device for VFIO to work with devices behind the bridge.

The ArchLinux PCI passthru guide has a script in "Ensuring that the groups are valid" section that helps you determine which devices are part of the IOMMU group. You need to bind all of them to vfio-pci.

sankalpatimilsina12 commented 2 years ago

Listing devices on the same iommu group:

ls /sys/bus/pci/devices/0000:44:00.3/iommu_group/devices
0000:40:01.0  0000:40:01.1  0000:40:01.3  0000:40:01.4  0000:41:00.0  0000:42:00.0  0000:43:00.0  0000:44:00.0  0000:44:00.1  0000:44:00.2  0000:44:00.3

This one: 0000:44:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f0 drv=i40e unused=vfio-pci *Active*

is used to access the machine. I believe I can't bind that?

yoursunny commented 2 years ago

The other option is creating a virtual function on the device you want to use. According to dpdk-devbind.py documentations, virtual function may use VFIO individually. ndn-dpdk/docs/hardware.md has instructions for Intel virtual functions.

sankalpatimilsina12 commented 2 years ago

With Intel VF, it logs the same issue:

dpdk-devbind.py -s

Network devices using DPDK-compatible driver
============================================
0000:44:0e.0 'Ethernet Virtual Function 700 Series 154c' drv=vfio-pci unused=iavf

Network devices using kernel driver
===================================
0000:44:00.0 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f0 drv=i40e unused=vfio-pci *Active*
0000:44:00.1 'Ethernet Controller X710 for 10GBASE-T 15ff' if=enp68s0f1 drv=i40e unused=vfio-pci
0000:44:00.2 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' if=enp68s0f2 drv=i40e unused=vfio-pci *Active*
0000:44:00.3 'Ethernet Controller X710 for 10 Gigabit SFP+ 104e' if=enp68s0f3 drv=i40e unused=vfio-pci
0000:44:0e.1 'Ethernet Virtual Function 700 Series 154c' if=enp68s0f3v1 drv=iavf unused=vfio-pci
0000:44:0e.2 'Ethernet Virtual Function 700 Series 154c' if=enp68s0f3v2 drv=iavf unused=vfio-pci
0000:44:0e.3 'Ethernet Virtual Function 700 Series 154c' if=enp68s0f3v3 drv=iavf unused=vfio-pci
0000:c1:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f0 drv=mlx5_core unused=vfio-pci
0000:c1:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=enp193s0f1 drv=mlx5_core unused=vfio-pci *Active*
{"level":"error","ts":1655233730.367331,"logger":"eal","msg":"rte_dev_probe error","addr":"0000:44:0e.0","args":"","error":"1 Operation not permitted"}
{"level":"error","ts":1655233730.367408,"logger":"DPDK","msg":"EAL: 0000:44:0e.0 VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound"}
{"level":"error","ts":1655233730.3674757,"logger":"DPDK","msg":"EAL: Driver cannot attach the device (0000:44:0e.0)"}
{"level":"error","ts":1655233730.3675182,"logger":"DPDK","msg":"EAL: Failed to attach device on primary process"}
ls /sys/bus/pci/devices/0000:44:0e.0/iommu_group/devices
0000:40:01.0  0000:40:01.3  0000:41:00.0  0000:43:00.0  0000:44:00.1  0000:44:00.3  0000:44:0e.1  0000:44:0e.3
0000:40:01.1  0000:40:01.4  0000:42:00.0  0000:44:00.0  0000:44:00.2  0000:44:0e.0  0000:44:0e.2
Pesa commented 2 years ago

I suspect something on this system doesn't support ACS (https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/hardware_considerations_for_implementing_sr-iov/index).

Also note that this is an AMD machine.

Pesa commented 2 years ago

It seems to me that too many stars need to align for VFIO to function properly and be usable in all cases. I suggest we go back and troubleshoot/fix the original problem in this report (using uio in a non-privileged container).

yoursunny commented 2 years ago

this is an AMD machine.

AMD needs a different kernel parameter. The ArchLinux PCI passthru guide has relevant instructions.

this system doesn't support ACS

DPDK VFIO guide says: If your device is behind a PCI-to-PCI bridge, the bridge will then be part of the IOMMU group in which your device is in. Therefore, the bridge driver should also be unbound from the bridge PCI device for VFIO to work with devices behind the bridge.

Since the port you want to use with DPDK and the port you want to use with kernel are physically different PCI cards, you may be able to move it to a different slot and place it behind a different PCI bridge. The motherboard manual should have a diagram that shows which slots are sharing a PCI bridge.

using uio in a non-privileged container

DPDK doesn't officially support running in unprivileged container. The trick I found in the past is unreliable. Thus, if you want UIO, you have to add back --priviledged flag.

yoursunny commented 2 years ago

I re-tested Docker UIO instructions.

It seems that Docker changed during an update within last 15 months, so that the container cannot write to /sys even if it's mounted with readonly=false. Inside the container the mountpoint shows "rw" but it's not writable.

Adding more capabilities, mounting more specific folders, or changing namespaces does not help at all. The only solution at the moment is --priviledged.

See also: https://github.com/moby/moby/issues/22825

Pesa commented 2 years ago

Adding more capabilities, mounting more specific folders, or changing namespaces does not help at all. The only solution at the moment is --priviledged.

Ok, then this is the "solution" to the original report. We will go back to using --privileged and the igb_uio driver. I believe we can close this issue as CANTFIX.

Btw you may want to update the issue title, "--iova-mode pa" and "docker network" are red herrings.