xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.26k stars 74 forks source link

Mellanox SR-IOV broken on 8.2.1 #544

Open Oleszkiewicz opened 2 years ago

Oleszkiewicz commented 2 years ago

On 8.2.1 the SR-IOV is broken with Mellanox cards,

creating an SR-IOV network results in this errors in xensource log:

https://gist.github.com/Oleszkiewicz/ef77405840f928e81ddbcdd7ccf302fe

the network is created by is unusable.

when trying to start a VM with this network I get the:

Mar 26 07:27:01 yggdrasil xapi: [error||1031 ||backtrace] Async.VM.start R:d0d65e426987 failed with exception Server_error(NETWORK_SRIOV_INSUFFICIENT_CAPACITY, [ OpaqueRef:             cb14d368-7a22-438e-bffc-433a5bc7b3cf ])
Mar 26 07:27:01 yggdrasil xapi: [error||1031 ||backtrace] Raised Server_error(NETWORK_SRIOV_INSUFFICIENT_CAPACITY, [ OpaqueRef:cb14d368-7a22-438e-bffc-433a5bc7b3cf ])

error, regardless of 15 VF capacity available...

Oleszkiewicz commented 2 years ago

This happens with both inbox drivers and Mellanox OFED drivers, so I guess the problem is in the xapi side...

olivierlambert commented 2 years ago

Can you reproduce on a fresh 8.2.0 install without any updates?

Oleszkiewicz commented 2 years ago

Problem "kindof" solved, at least with 8.2.0 , however I will check this with 8.2.1. too,

I did some daemon stracing, and I have found that it first reads from /sys/class/net/eth0/device/sriov_totalvfs - this presents the max number of VFS configured in the EEPROM of the card (configurable with the tool from manufacturer)

the next thing is it writes the same value to /sys/class/net/eth0/device/sriov_numvfs

This is where the magic starts, this virtual file presents the value configured on the kernel module start, that shows the number of activated virtual functions, it is "rw" however writing to it results in a "No such file or directory" error UNLESS we write the value that is already there..

So a workaround is to enable the exact MAX number of vfs in the kernel module configuration - then the xcp-networkd will go through and allow creating SR-IOV network without an error. In any other case it fails.

I believe a little bit better error handling should be in place here, at least mentioning what I have found out

(if total_vfs !=num_vfs the error message could be something like "activate {total_vfs} virtual functions in the NIC driver") the current "No such file or directory" is kindof misleading even though it is forwarded from the driver/sysfs actually...

Best Piotr

From: Piotr Oleszkiewicz Sent: 26 March 2022 14:43 To: xcp-ng/xcp @.>; xcp-ng/xcp @.> Cc: Author @.***> Subject: RE: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

This is my next thing to check

Sent from my Galaxy

-------- Original message -------- From: Olivier Lambert @.**@.>> Date: 3/26/22 12:45 (GMT+01:00) To: xcp-ng/xcp @.**@.>> Cc: Piotr Oleszkiewicz @.**@.>>, Author @.**@.>> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Can you reproduce on a fresh 8.2.0 install without any updates?

- Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-1079668905, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVD777EZ6CHKL25K3P3VB32D3ANCNFSM5RWDWDIA. You are receiving this because you authored the thread.Message ID: @.**@.>>

olivierlambert commented 2 years ago

That's… interesting indeed. We should at least get that feedback to XAPI devs. Thoughts @stormi ?

stormi commented 2 years ago

So as I understand it it's not a regression from the 8.2 to 8.2.1 update. I think it would be good indeed to write a detailed bug report at https://github.com/xapi-project/xen-api/issues

Alphaprot commented 2 years ago

I am experiencing similiar behaviour and hope you do not consider this as issue-hijacking: I am also trying to get a ConnectX-3 40 Gbit Dual Port NIC working as a SR-IOV enabled NIC and fail with

Apr 15 23:01:52 hypervisor01 xcp-networkd: [ warn||153 |Async.network_sriov.create R:8b88ed3ead97|network_server] Failed to enable SR-IOV on eth2 with error: Error: set SR-IOV numvfs error with exception (Sys_error "No such file or directory") on eth2
Apr 15 23:01:52 hypervisor01 xapi: [error||1583 ||backtrace] Async.network_sriov.create R:8b88ed3ead97 failed with exception Server_error(NETWORK_SRIOV_ENABLE_FAILED, [ OpaqueRef:05bf6088-0646-4f1d-bac9-9a29f24d4263; Error: set SR-IOV numvfs error with exception (Sys_error "No such file or directory") on eth2 ])

The full log entry regarding the creation of a SR-IOV network on eth2 can be found here https://gist.github.com/Alphaprot/c327aaca1f10342adb32ad8872ebfc35

@Oleszkiewicz What params did you set inside the /etc/modprobe.d/mlx4_core.conf ? Or are you using a different driver? I found a Mellanox KB article about Configuring SR-IOV for ConnectX-3 on KVM , from which I "borrowed" the driver VFs-parameters. It sadly does not work due to aforementioned errors.

EDIT: This error is quite funny because there "is" a file like the one xcp-networkd is looking for:

# cat /sys/class/net/eth2/device/sriov_totalvfs returns 8

# cat /sys/class/net/eth2/device/sriov_numvfs returns 0 (which makes me assume that SR-IOV is not configured properly on the driver/kernel config)

Oleszkiewicz commented 2 years ago

The driver settings that will work depend heavily on how you have your card configured with mellanox tools, and the correct configuration depends heavily on your use case. Basically you need to set max virtual functions to the same value in the card config and the driver, the behaviour is the same in inbox drivers and OFED drivers. Then you should decide whether you pass single port or dual port virtual functions to the vm. Hint: single port does not confuse xcp-ng, while dual port adds both ports to vm, while in xcp-ng you pass just one of the ports. Hint2: infiniband is not properly supported, would require a few days work to make xcp-ng understand how to properly configure virtual function while starting the vm (now it tries to set Eth MAC on IB interface). If you configure both ports to ETH however this is not an issue. I have successed in configuring my test cluster with SR-IOV and if you need further help - contact me directly and I'll assist you on this.

Sent from my Galaxy

-------- Original message -------- From: Yannik Zausig @.> Date: 4/15/22 23:31 (GMT+01:00) To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>, Author @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Can confirm this behaviour, I am also trying to get a ConnectX-3 40 Gbit Dual Port NIC working as a SR-IOV enabled NIC and fail with

Apr 15 23:01:52 hypervisor01 xapi: [error||1583 ||backtrace] Async.network_sriov.create R:8b88ed3ead97 failed with exception Server_error(NETWORK_SRIOV_ENABLE_FAILED, [ OpaqueRef:05bf6088-0646-4f1d-bac9-9a29f24d4263; Error: set SR-IOV numvfs error with exception (Sys_error "No such file or directory") on eth2 ])```

The full log entry regarding the creation of a SR-IOV network on eth2 can be found here

@Oleszkiewicz What params did you set inside the /etc/modprobe.d/mlx4_core.conf ? Or are you using a different driver? I found a Mellanox KB article about Configuring SR-IOV for ConnectX-3 on KVM , from which I "borrowed" the driver VFs-parameters. It sadly does not work due to aforementioned errors.

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-1100422548, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVFZ6HFSW4KGZEOHTKLVFHNZ3ANCNFSM5RWDWDIA. You are receiving this because you authored the thread.Message ID: @.***>

Alphaprot commented 2 years ago

Now if I just knew how to contact you outside this issue, since both our github profiles appear to have pretty restricting privacy settings.

Regarding the problems I am experiencing (I have really no experience with SR-IOV and am not a dev myself) I see no VFs, only the physical card after reboot.

# lscpi | grep Mellanox
08:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

In the BIOS (DELL R320 calls this "SR-IOV Global Enable") as well as on the ConnectX-3 firmware, SR-IOV are enabled and Intel VT-d is activated, too. I've added the intel_iommu=on kernel boot paramenter in /boot/efi/EFI/xenserver/grub.cfg and checked that it persists boot.

The output of the card firmware configuration is

# mlxconfig query

Device #1:
----------

Device type:    ConnectX3
Device:         /dev/mst/mt4099_pciconf0

Configurations:                              Next Boot
         SRIOV_EN                            True(1)
         NUM_OF_VFS                          8
         LOG_BAR_SIZE                        3
         BOOT_OPTION_ROM_EN_P1               True(1)
         BOOT_VLAN_EN_P1                     False(0)
         BOOT_RETRY_CNT_P1                   0
         LEGACY_BOOT_PROTOCOL_P1             None(0)
         BOOT_VLAN_P1                        1
         BOOT_OPTION_ROM_EN_P2               True(1)
         BOOT_VLAN_EN_P2                     False(0)
         BOOT_RETRY_CNT_P2                   0
         LEGACY_BOOT_PROTOCOL_P2             None(0)
         BOOT_VLAN_P2                        1

My /etc/modprobe.d/mlx4_core.conf file looks like this: options mlx4_core num_vfs=4,4,0 port_type_array=2,2 probe_vf=4,4,0

I then reloaded the core module and its companion-modules mlx4_ib and mlx4_en:

modprobe -r mlx4_ib mlx4_en
modprobe -r mlx4_core
modprobe mlx4_core mlx4_ib mlx4_en

Now all eight probed VFs show up as devices.

# lspci | grep Mellanox
08:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
08:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
08:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

They can also be found as network interfaces

# ip link show
31: side-5519-eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ec:0d:9a:0d:b9:a0 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
32: side-47-eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ec:0d:9a:0d:b9:a1 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
    vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

However, this kernel module options do not survive a reboot (wild guess, do the kernel drivers for e.g. networking get loaded at boot time without accessing the configuration options in /etc/modprobe.d/) and while the file remains there/intact, I have to reapply it each time by removing the mlx4-driver components and re-adding them. I have far too less knowledge about this, but I am trying my best. After inspecting the initrd, my assumption might hold true, as I do not see any reference to a config in /etc/modprobe.d for mlx4-related drivers, while other modules reference there:

# lsinitrd /boot/initrd-4.19-xen.img |grep mlx
drwxr-xr-x   2 root     root            0 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4
-rwxr--r--   1 root     root       667528 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko
-rwxr--r--   1 root     root       260536 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko

And here I am out of luck (I used to create the SR-IOV networks via XCP-ng Center, which always fails , but am currently forced to use the CLI). There is a plethorra (58?!) of VF/side interfaces available with xe pif-list and in XCP-ng Center only 2 of them show as SR-IOV Capable. Note that I could not reboot (I rescanned the PIFs using xe pif-scan host-uuid=<my-host-id>) because I would lose the driver settings again (as mentioned earlier).

I am kindly asking for your assistance here. EDIT: And I always forget about pre-viewing my github entry, sorry for messing the initial version up with forward ticks instead of back-ticks in one code block. :(

Oleszkiewicz commented 2 years ago

Send me some contact details I'll contact you or find me on fb :) I have a sailing yacht in my background pic.

As for no vfs on reboot - you need to recreate the initrd/intramfs (use dracut) so the driver is initialized properly on boot. Then do not probe vfs unless you need them on dom0. You don't need to probe them if you just want a pass-through.

Sent from my Galaxy

-------- Original message -------- From: Yannik Zausig @.> Date: 4/16/22 13:32 (GMT+01:00) To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>, Mention @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Now if I just knew how to contact you outside this issue, since both our github profiles appear to have pretty restricting privacy settings.

Regarding the problems I am experiencing (I have really no experience with SR-IOV and am not a dev myself) I see no VFs, only the physical card after reboot.

lscpi | grep Mellanox

08:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

In the BIOS (DELL R320 calls this "SR-IOV Global Enable") as well as on the ConnectX-3 firmware, SR-IOV are enabled and Intel VT-d is activated, too. I've added the intel_iommu=on kernel boot paramenter in /boot/efi/EFI/xenserver/grub.cfg and checked that it persists boot.

The output of the card firmware configuration is

mlxconfig query

Device #1:


Device type: ConnectX3

Device: /dev/mst/mt4099_pciconf0

Configurations: Next Boot

     SRIOV_EN                            True(1)

     NUM_OF_VFS                          8

     LOG_BAR_SIZE                        3

     BOOT_OPTION_ROM_EN_P1               True(1)

     BOOT_VLAN_EN_P1                     False(0)

     BOOT_RETRY_CNT_P1                   0

     LEGACY_BOOT_PROTOCOL_P1             None(0)

     BOOT_VLAN_P1                        1

     BOOT_OPTION_ROM_EN_P2               True(1)

     BOOT_VLAN_EN_P2                     False(0)

     BOOT_RETRY_CNT_P2                   0

     LEGACY_BOOT_PROTOCOL_P2             None(0)

     BOOT_VLAN_P2                        1

My /etc/modprobe.d/mlx4_core.conf file looks like this: options mlx4_core num_vfs=4,4,0 port_type_array=2,2 probe_vf=4,4,0

I then reloaded the core module and its companion-modules mlx4_ib and mlx4_en:

modprobe -r mlx4_ib mlx4_en

modprobe -r mlx4_core

modprobe mlx4_core mlx4_ib mlx4_en

Now all eight probed VFs show up as devices. ´´´

lspci | grep Mellanox

08:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3] 08:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 08:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

They can also be found as network interfaces

ip link show

31: side-5519-eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ec:0d:9a:0d:b9:a0 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto 32: side-47-eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ec:0d:9a:0d:b9:a1 brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto vf 7 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto

However, this kernel module options do not survive a reboot (wild guess, do the kernel drivers for e.g. networking get loaded at boot time without accessing the configuration options in /etc/modprobe.d/) and while the file remains there/intact, I have to reapply it each time by removing the mlx4-driver components and re-adding them.

I have far too less knowledge about this, but I am trying my best. After inspecting the initrd, my assumption might hold true, as I do not see any reference to a config in /etc/modprobe.d for mlx4-related drivers, while other modules reference there:

lsinitrd /boot/initrd-4.19-xen.img |grep mlx

drwxr-xr-x 2 root root 0 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4 -rwxr--r-- 1 root root 667528 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko -rwxr--r-- 1 root root 260536 Mar 24 12:38 usr/lib/modules/4.19.0+1/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko

And here I am out of luck (I used to create the SR-IOV networks via XCP-ng Center, which always fails , but am currently forced to use the CLI). There is a plethorra (58?!) of VF/side interfaces available with xe pif-list and in XCP-ng Center only 2 of them show as SR-IOV Capable.

Note that I could not reboot (I rescanned the PIFs using xe pif-scan host-uuid=<my-host-id>) because I would lose the driver settings again (as mentioned earlier).

I am kindly asking for your assistance here.

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-1100644932, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVDPOGLAKD7D6L6Y4NTVFKQKTANCNFSM5RWDWDIA. You are receiving this because you were mentioned.Message ID: @.***>

Alphaprot commented 2 years ago

Long time no reply from my side - I had little time for my small lab and was struggling to get it working by myself. I was able to remove the 68 VIFs that should not be there by simply disabling the PCIe-Slot of my NIC and purging/forgetting the networks and associated PIFs and then re-scanning them.

I was able to pass them through, but FreeBSD's mxl4 drivers fail to establish a com channel to the PIF thus triggering the VIF's and PIF's reset/recovery loop over and over.

Just hit me up with a quick/empty reply to advance.07woofers(at)icloud.com so that I can describe what I did a bit more detailed.

enidice commented 2 years ago

This post probably isn't of use to users of ConnectX-3 Pro since the OFED version I mention apparently doesn't support it. It did though occur to me: I didn't see any mention of Mellanox firmware version earlier in this thread. Maybe what firmware ships in the LTS 4.9-5.1.0.0 package could be installed on the host from another OS, and has some positive effect, but possibly such options have already been considered. ¯\(ツ)

I recently had a positive Mellanox (MT27800 Family [ConnectX-5]) SR-IOV experience involving XCP-ng 8.2.1 & FreeBSD (13.1) based guests (OPNsense 22.7) so I thought I'd share that here.

My reasoning to looking at SR-IOV at all, was due to experience of Xen / XCP-ng private networking exhibiting poor throughput when using FreeBSD based guest vm's. In case the back story is useful see: https://xcp-ng.org/forum/topic/5668/what-are-realistic-experienced-throughput-outcomes-for-internal-networks/42?_=1660177147391 (I will also update that post at some point)

As presented to me these network cards were configured with 16 VFs. Using SR-IOV network interfaces with the OPNsense guest vm resulted in expected network throughput across 2 interfaces on that guest vm; in the case of these cards approx ~22Gbit/s. The issue faced here was a lack of remaining VFs after the OPNsense was configured with the desired network interfaces / topology.

On my first pass using the Nvidia/Mellanox download site I was unsuccessful getting the installer package to do anything useful. After a diversion via Ubuntu eventually I've ended up with:

https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/ (There doesn't seem to be a LTS version available for xenserver) MLNX_OFED_LINUX-5.7-1.0.2.0-xenserver8.2-x86_64.tgz

installed using:

./mlnxofedinstall -vvv --without-32bit --distro xenserver --force --skip-distro-check --without-depcheck --without-fw-update

Actually I've been successful without the --without-fw-update flag, meaning that I get latest firmware installed and the above mentioned throughput across SR-IOV networks connected to the OPNsense, but I've left it in above in case of rapid-copy-pasters

I don't have entries under /etc/modprobe.d/

One other point I noted on my servers under test: There is an option to configure the number of VF's in the UEFI/BIOS. If I change that from the value set by default in the firmware or via eg mlxconfig -d /dev/mst/mt4119_pciconf0 s NUM_OF_VFS=64 then I also have the situation where I'm able to configure networks but they do not pass traffic.

Oleszkiewicz commented 1 year ago

This is my next thing to check

Sent from my Galaxy

-------- Original message -------- From: Olivier Lambert @.> Date: 3/26/22 12:45 (GMT+01:00) To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>, Author @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Can you reproduce on a fresh 8.2.0 install without any updates?

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-1079668905, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVD777EZ6CHKL25K3P3VB32D3ANCNFSM5RWDWDIA. You are receiving this because you authored the thread.Message ID: @.***>

viniciusferrao commented 2 months ago

I'm also with issues with ConnectX-3 in XCP-ng 8.2.1 when trying to enable SR-IOV. I'm already using the new old 4.9 drivers on testing but SR-IOV is a miss.

[    1.842791] pci 0000:41:00.0: VF(n) BAR2 space: [mem 0x30010000000-0x3001fffffff 64bit pref] (contains BAR2 for 32 VFs)
[   13.583341] Compat-mlnx-ofed backport release: 8e3d458
[   13.583345] Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git 8e3d458
[   13.583346] compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git
[   13.601477] mlx4_core: Mellanox ConnectX core driver v4.9-7.1.0
[   13.601516] mlx4_core: Initializing 0000:41:00.0
[   18.269525] mlx4_core: device is working in RoCE mode: Roce V1
[   18.269530] mlx4_core: UD QP Gid type is: V1
[   19.982731] mlx4_core 0000:41:00.0: DMFS high rate steer mode is: default performance
[   19.983556] mlx4_core 0000:41:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)
[   20.138677] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.9-7.1.0
[   20.139112] mlx4_en 0000:41:00.0: Activating port:1
[   20.146281] mlx4_en: 0000:41:00.0: Port 1: Using 16 TX rings
[   20.146285] mlx4_en: 0000:41:00.0: Port 1: Using 16 RX rings
[   20.146850] mlx4_en: 0000:41:00.0: Port 1: Initializing port
[   20.147270] mlx4_en 0000:41:00.0: registered PHC clock
[   20.147874] mlx4_en 0000:41:00.0: Activating port:2
[   20.148947] mlx4_en: 0000:41:00.0: Port 2: Using 16 TX rings
[   20.148950] mlx4_en: 0000:41:00.0: Port 2: Using 16 RX rings
[   20.149298] mlx4_en: 0000:41:00.0: Port 2: Initializing port
[   20.204508] mlx4_core 0000:41:00.0 side-9892-eth0: renamed from eth0
[   20.259418] mlx4_core 0000:41:00.0 side-1155-eth1: renamed from eth1
[   22.075613] mlx4_en: side-1155-eth1: Link Up
[   22.125989] mlx4_en: side-9892-eth0: Link Up
[   23.132098] mlx4_core 0000:41:00.0 eth5: renamed from side-1155-eth1
[   23.180124] mlx4_core 0000:41:00.0 eth4: renamed from side-9892-eth0
[   24.486927] mlx4_en: eth5: Steering Mode 2
[   24.490794] mlx4_en: eth5: Setting RSS context tunnel type to RSS on inner headers
[   24.512845] mlx4_core 0000:41:00.0: going promisc on 2
[   24.657084] mlx4_en: eth4: Steering Mode 2
[   24.661621] mlx4_en: eth4: Setting RSS context tunnel type to RSS on inner headers
[   24.682269] mlx4_core 0000:41:00.0: going promisc on 1
[   25.917201] mlx4_en: eth4: Steering Mode 2
[   25.921681] mlx4_en: eth4: Setting RSS context tunnel type to RSS on inner headers
[   25.965859] mlx4_en: eth4: Link Down
[   25.966035] mlx4_core 0000:41:00.0: going promisc on 1
[   26.307221] mlx4_en: eth5: Steering Mode 2
[   26.310533] mlx4_en: eth5: Setting RSS context tunnel type to RSS on inner headers
[   26.349171] mlx4_en: eth5: Link Down
[   26.349345] mlx4_core 0000:41:00.0: going promisc on 2
[   27.924260] mlx4_en: eth4: Link Up
[   29.398925] mlx4_en: eth5: Link Up
[   69.753629] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs
[   69.753640] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs
[   70.377649] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs
[   70.377660] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs

I took the liberty to add the MLNX OFED 4.9 repo just to install mstflint since it's not shipped on 8.2.1 also. Repo was available here: https://linux.mellanox.com/public/repo/mlnx_en/4.9-7.1.0.0/rhel7.9/

With mstconfig I was able to probe to card to check for SR-IOV:

[01:29 xen1 yum.repos.d]# mstconfig q

Device #1:
----------

Device type:    ConnectX3Pro    
Device:         /sys/bus/pci/devices/0000:41:00.0/config

Configurations:                              Next Boot
         SRIOV_EN                            True(1)         
         NUM_OF_VFS                          32              
         LOG_BAR_SIZE                        3               
         BOOT_OPTION_ROM_EN_P1               True(1)         
         BOOT_VLAN_EN_P1                     False(0)        
         BOOT_RETRY_CNT_P1                   0               
         LEGACY_BOOT_PROTOCOL_P1             PXE(1)          
         BOOT_VLAN_P1                        1               
         BOOT_OPTION_ROM_EN_P2               True(1)         
         BOOT_VLAN_EN_P2                     False(0)        
         BOOT_RETRY_CNT_P2                   0               
         LEGACY_BOOT_PROTOCOL_P2             PXE(1)          
         BOOT_VLAN_P2                        1               
         IP_VER_P1                           IPv4(0)         
         IP_VER_P2                           IPv4(0)   

So if anyone found a solution for SR-IOV please share it here.

stormi commented 1 month ago

Sad to read that. We hoped the LTS driver would sort things out. Back to "someone will need to diagnose where the issue yields", I suppose. Thanks for your feedback!

Oleszkiewicz commented 1 month ago

Actually SR-IOV works for me on 8.2.1

MLNX_OFED_LINUX4.9-7.1.0.0-rhel7.5-x86_64-ext

11:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] 11:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:00.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:00.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:00.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:00.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:00.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:00.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:01.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:01.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:01.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:01.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:01.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:01.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:01.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:01.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:02.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:02.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:02.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:02.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:02.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:02.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:02.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:02.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:03.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:03.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:03.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:03.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:03.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:03.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:03.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:03.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:04.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:04.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:04.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:04.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:04.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:04.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:04.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:04.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:05.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:05.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:05.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:05.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:05.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:05.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:05.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:05.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:06.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:06.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:06.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:06.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:06.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:06.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:06.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:06.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:07.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:07.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:07.2 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:07.3 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:07.4 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:07.5 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:07.6 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] 11:07.7 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

SR IOV works perfectly in VMs


From: Samuel Verschelde @.> Sent: Wednesday, 31 July 2024 17:56 To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Mention @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Sad to read that. We hoped the LTS driver would sort things out. Back to "someone will need to diagnose where the issue yields", I suppose. Thanks for your feedback!

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-2260852160, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVFRBGOPDW7DBIU7F3TZPECKVAVCNFSM6AAAAABLD5NBLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRQHA2TEMJWGA. You are receiving this because you were mentioned.

stormi commented 1 month ago

@Oleszkiewicz What's the version of the driver that you are using? Is it the one we packaged as mlx4-modules-alt (in the xcp-ng-testing repository for XCP-ng 8.2.1, xcp-ng-base for XCP-ng 8.3)?

Oleszkiewicz commented 1 month ago

No, I have compiled the driver from Mellanox site myself.


From: Samuel Verschelde @.> Sent: Thursday, 1 August 2024 18:30 To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Mention @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

@Oleszkiewiczhttps://github.com/Oleszkiewicz What's the version of the driver that you are using? Is it the one we packaged as mlx4-modules-alt (in the xcp-ng-testing repository for XCP-ng 8.2.1, xcp-ng-base for XCP-ng 8.3)?

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-2263481832, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVB5W52RMVZZX4PS7YLZPJPDXAVCNFSM6AAAAABLD5NBLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGQ4DCOBTGI. You are receiving this because you were mentioned.Message ID: @.***>

stormi commented 1 month ago

Could you try it? It's compiled from the same sources.

Oleszkiewicz commented 1 month ago

I run this on production, don't have the same config to play with. Verify if you have SR-IOV properly enabled in the module and if you can see the VFs in lspci, it is possible that you don't see them if you did not refresh the initrd. To verify you can remove and reload the module after the system is running. You should be able to see VFs. If you don't check if you have the proper kernel module configuration.


From: Samuel Verschelde @.> Sent: Thursday, 1 August 2024 18:56 To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Mention @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

Could you try it? It's compiled from the same sources.

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-2263529515, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVHR6AIX6QGBVLQ2N5DZPJSEZAVCNFSM6AAAAABLD5NBLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGUZDSNJRGU. You are receiving this because you were mentioned.Message ID: @.***>

viniciusferrao commented 1 month ago

I'm also with issues with ConnectX-3 in XCP-ng 8.2.1 when trying to enable SR-IOV. I'm already using the new old 4.9 drivers on testing but SR-IOV is a miss.

[    1.842791] pci 0000:41:00.0: VF(n) BAR2 space: [mem 0x30010000000-0x3001fffffff 64bit pref] (contains BAR2 for 32 VFs)
[   13.583341] Compat-mlnx-ofed backport release: 8e3d458
[   13.583345] Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git 8e3d458
[   13.583346] compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git
[   13.601477] mlx4_core: Mellanox ConnectX core driver v4.9-7.1.0
[   13.601516] mlx4_core: Initializing 0000:41:00.0
[   18.269525] mlx4_core: device is working in RoCE mode: Roce V1
[   18.269530] mlx4_core: UD QP Gid type is: V1
[   19.982731] mlx4_core 0000:41:00.0: DMFS high rate steer mode is: default performance
[   19.983556] mlx4_core 0000:41:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)
[   20.138677] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.9-7.1.0
[   20.139112] mlx4_en 0000:41:00.0: Activating port:1
[   20.146281] mlx4_en: 0000:41:00.0: Port 1: Using 16 TX rings
[   20.146285] mlx4_en: 0000:41:00.0: Port 1: Using 16 RX rings
[   20.146850] mlx4_en: 0000:41:00.0: Port 1: Initializing port
[   20.147270] mlx4_en 0000:41:00.0: registered PHC clock
[   20.147874] mlx4_en 0000:41:00.0: Activating port:2
[   20.148947] mlx4_en: 0000:41:00.0: Port 2: Using 16 TX rings
[   20.148950] mlx4_en: 0000:41:00.0: Port 2: Using 16 RX rings
[   20.149298] mlx4_en: 0000:41:00.0: Port 2: Initializing port
[   20.204508] mlx4_core 0000:41:00.0 side-9892-eth0: renamed from eth0
[   20.259418] mlx4_core 0000:41:00.0 side-1155-eth1: renamed from eth1
[   22.075613] mlx4_en: side-1155-eth1: Link Up
[   22.125989] mlx4_en: side-9892-eth0: Link Up
[   23.132098] mlx4_core 0000:41:00.0 eth5: renamed from side-1155-eth1
[   23.180124] mlx4_core 0000:41:00.0 eth4: renamed from side-9892-eth0
[   24.486927] mlx4_en: eth5: Steering Mode 2
[   24.490794] mlx4_en: eth5: Setting RSS context tunnel type to RSS on inner headers
[   24.512845] mlx4_core 0000:41:00.0: going promisc on 2
[   24.657084] mlx4_en: eth4: Steering Mode 2
[   24.661621] mlx4_en: eth4: Setting RSS context tunnel type to RSS on inner headers
[   24.682269] mlx4_core 0000:41:00.0: going promisc on 1
[   25.917201] mlx4_en: eth4: Steering Mode 2
[   25.921681] mlx4_en: eth4: Setting RSS context tunnel type to RSS on inner headers
[   25.965859] mlx4_en: eth4: Link Down
[   25.966035] mlx4_core 0000:41:00.0: going promisc on 1
[   26.307221] mlx4_en: eth5: Steering Mode 2
[   26.310533] mlx4_en: eth5: Setting RSS context tunnel type to RSS on inner headers
[   26.349171] mlx4_en: eth5: Link Down
[   26.349345] mlx4_core 0000:41:00.0: going promisc on 2
[   27.924260] mlx4_en: eth4: Link Up
[   29.398925] mlx4_en: eth5: Link Up
[   69.753629] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs
[   69.753640] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs
[   70.377649] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs
[   70.377660] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs

I took the liberty to add the MLNX OFED 4.9 repo just to install mstflint since it's not shipped on 8.2.1 also. Repo was available here: https://linux.mellanox.com/public/repo/mlnx_en/4.9-7.1.0.0/rhel7.9/

With mstconfig I was able to probe to card to check for SR-IOV:

[01:29 xen1 yum.repos.d]# mstconfig q

Device #1:
----------

Device type:    ConnectX3Pro    
Device:         /sys/bus/pci/devices/0000:41:00.0/config

Configurations:                              Next Boot
         SRIOV_EN                            True(1)         
         NUM_OF_VFS                          32              
         LOG_BAR_SIZE                        3               
         BOOT_OPTION_ROM_EN_P1               True(1)         
         BOOT_VLAN_EN_P1                     False(0)        
         BOOT_RETRY_CNT_P1                   0               
         LEGACY_BOOT_PROTOCOL_P1             PXE(1)          
         BOOT_VLAN_P1                        1               
         BOOT_OPTION_ROM_EN_P2               True(1)         
         BOOT_VLAN_EN_P2                     False(0)        
         BOOT_RETRY_CNT_P2                   0               
         LEGACY_BOOT_PROTOCOL_P2             PXE(1)          
         BOOT_VLAN_P2                        1               
         IP_VER_P1                           IPv4(0)         
         IP_VER_P2                           IPv4(0)   

So if anyone found a solution for SR-IOV please share it here.

Ok, I was missing the custom options line for mlx4 in /etc/modprobe.d/mlx4_core.conf that @Alphaprot mentioned:

# cat /etc/modprobe.d/mlx4_core.conf 
options mlx4_core num_vfs=16,16,0 port_type_array=2,2 probe_vf=16,16,0

After that and a reboot I was able to also use SR-IOV with the 4.9 drivers on testing repo. Just had to also use mstconfig with options set SRIOV_EN=1 NUM_OF_VFS=32 to make it work as expected. And also this setup is surviving reboots. Which is good.

Only issue that bothers me now, is that I have a dual port Connect-X3 card and I created two SR-IOV networks, each for both ports, however when adding a VF from each network to a VM it seems that the driver allocates on the same port:

[  842.736871] mlx4_core 0000:41:00.0: default mac on vf 0 port 1 to 8A28A1519BD7 will take effect only after vf restart
[  842.816186] mlx4_core 0000:41:00.0: default mac on vf 1 port 1 to 1E061C258BC7 will take effect only after vf restart

Not sure if my modprobe options are wrong or not.

PS: @stormi I can do some tests since my system isn't in production yet.

Oleszkiewicz commented 1 month ago

On this card you will always get/allocate both ports on SR-IOV, you cannot push just one port to the VM.


From: Vinícius Ferrão @.> Sent: Friday, 2 August 2024 18:37 To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Mention @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

I'm also with issues with ConnectX-3 in XCP-ng 8.2.1 when trying to enable SR-IOV. I'm already using the new old 4.9 drivers on testing but SR-IOV is a miss.

[ 1.842791] pci 0000:41:00.0: VF(n) BAR2 space: [mem 0x30010000000-0x3001fffffff 64bit pref] (contains BAR2 for 32 VFs) [ 13.583341] Compat-mlnx-ofed backport release: 8e3d458 [ 13.583345] Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git 8e3d458 [ 13.583346] compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git [ 13.601477] mlx4_core: Mellanox ConnectX core driver v4.9-7.1.0 [ 13.601516] mlx4_core: Initializing 0000:41:00.0 [ 18.269525] mlx4_core: device is working in RoCE mode: Roce V1 [ 18.269530] mlx4_core: UD QP Gid type is: V1 [ 19.982731] mlx4_core 0000:41:00.0: DMFS high rate steer mode is: default performance [ 19.983556] mlx4_core 0000:41:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link) [ 20.138677] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.9-7.1.0 [ 20.139112] mlx4_en 0000:41:00.0: Activating port:1 [ 20.146281] mlx4_en: 0000:41:00.0: Port 1: Using 16 TX rings [ 20.146285] mlx4_en: 0000:41:00.0: Port 1: Using 16 RX rings [ 20.146850] mlx4_en: 0000:41:00.0: Port 1: Initializing port [ 20.147270] mlx4_en 0000:41:00.0: registered PHC clock [ 20.147874] mlx4_en 0000:41:00.0: Activating port:2 [ 20.148947] mlx4_en: 0000:41:00.0: Port 2: Using 16 TX rings [ 20.148950] mlx4_en: 0000:41:00.0: Port 2: Using 16 RX rings [ 20.149298] mlx4_en: 0000:41:00.0: Port 2: Initializing port [ 20.204508] mlx4_core 0000:41:00.0 side-9892-eth0: renamed from eth0 [ 20.259418] mlx4_core 0000:41:00.0 side-1155-eth1: renamed from eth1 [ 22.075613] mlx4_en: side-1155-eth1: Link Up [ 22.125989] mlx4_en: side-9892-eth0: Link Up [ 23.132098] mlx4_core 0000:41:00.0 eth5: renamed from side-1155-eth1 [ 23.180124] mlx4_core 0000:41:00.0 eth4: renamed from side-9892-eth0 [ 24.486927] mlx4_en: eth5: Steering Mode 2 [ 24.490794] mlx4_en: eth5: Setting RSS context tunnel type to RSS on inner headers [ 24.512845] mlx4_core 0000:41:00.0: going promisc on 2 [ 24.657084] mlx4_en: eth4: Steering Mode 2 [ 24.661621] mlx4_en: eth4: Setting RSS context tunnel type to RSS on inner headers [ 24.682269] mlx4_core 0000:41:00.0: going promisc on 1 [ 25.917201] mlx4_en: eth4: Steering Mode 2 [ 25.921681] mlx4_en: eth4: Setting RSS context tunnel type to RSS on inner headers [ 25.965859] mlx4_en: eth4: Link Down [ 25.966035] mlx4_core 0000:41:00.0: going promisc on 1 [ 26.307221] mlx4_en: eth5: Steering Mode 2 [ 26.310533] mlx4_en: eth5: Setting RSS context tunnel type to RSS on inner headers [ 26.349171] mlx4_en: eth5: Link Down [ 26.349345] mlx4_core 0000:41:00.0: going promisc on 2 [ 27.924260] mlx4_en: eth4: Link Up [ 29.398925] mlx4_en: eth5: Link Up [ 69.753629] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs [ 69.753640] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs [ 70.377649] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs [ 70.377660] mlx4_core 0000:41:00.0: Driver doesn't support SRIOV configuration via sysfs

I took the liberty to add the MLNX OFED 4.9 repo just to install mstflint since it's not shipped on 8.2.1 also. Repo was available here: https://linux.mellanox.com/public/repo/mlnx_en/4.9-7.1.0.0/rhel7.9/

With mstconfig I was able to probe to card to check for SR-IOV:

[01:29 xen1 yum.repos.d]# mstconfig q

Device #1:

Device type: ConnectX3Pro Device: /sys/bus/pci/devices/0000:41:00.0/config

Configurations: Next Boot SRIOV_EN True(1) NUM_OF_VFS 32 LOG_BAR_SIZE 3 BOOT_OPTION_ROM_EN_P1 True(1) BOOT_VLAN_EN_P1 False(0) BOOT_RETRY_CNT_P1 0 LEGACY_BOOT_PROTOCOL_P1 PXE(1) BOOT_VLAN_P1 1 BOOT_OPTION_ROM_EN_P2 True(1) BOOT_VLAN_EN_P2 False(0) BOOT_RETRY_CNT_P2 0 LEGACY_BOOT_PROTOCOL_P2 PXE(1) BOOT_VLAN_P2 1 IP_VER_P1 IPv4(0) IP_VER_P2 IPv4(0)

So if anyone found a solution for SR-IOV please share it here.

Ok, I was missing the custom options line for mlx4 in /etc/modprobe.d/mlx4_core.conf that @Alphaprothttps://github.com/Alphaprot mentioned:

cat /etc/modprobe.d/mlx4_core.conf

options mlx4_core num_vfs=16,16,0 port_type_array=2,2 probe_vf=16,16,0

After that and a reboot I was able to also use SR-IOV with the 4.9 drivers on testing repo. Just had to also use mstconfig with options set SRIOV_EN=1 NUM_OF_VFS=32 to make it work as expected. And also this setup is surviving reboots. Which is good.

Only issue that bothers me now, is that I have a dual port Connect-X3 card and I created two SR-IOV networks, each for both ports, however when adding a VF from each network to a VM it seems that the driver allocates on the same port:

[ 842.736871] mlx4_core 0000:41:00.0: default mac on vf 0 port 1 to 8A28A1519BD7 will take effect only after vf restart [ 842.816186] mlx4_core 0000:41:00.0: default mac on vf 1 port 1 to 1E061C258BC7 will take effect only after vf restart

Not sure if my modprobe options are wrong or not.

PS: @stormihttps://github.com/stormi I can do some tests since my system isn't in production yet.

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-2265765826, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVERCSBQEQDS26CUVKDZPOYWNAVCNFSM6AAAAABLD5NBLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRVG43DKOBSGY. You are receiving this because you were mentioned.Message ID: @.***>

viniciusferrao commented 1 month ago

On this card you will always get/allocate both ports on SR-IOV, you cannot push just one port to the VM.

I'm sorry @Oleszkiewicz but I don't get it.

What I want is exactly allocate both ports, since the idea is to have the physical links bonded with LACP. So if I understood correctly I must have two SR-IOV networks for each interface of the card, and them add two SR-IOV networks to a VM. Is my understanding correct?

However what I could observe is that when I add the distinct SR-IOV networks, the VF count is only on the first interface. I ignored the issue and tried to create the bond inside the VM, but when I do that I immediately get an error saying:

image

Regarding the VM information this is what I have:

[17:19 xen1 ~]# xe vif-list vm-uuid=942a0af4-bd7a-6b44-97fb-0dd95ca078cb 
uuid ( RO)            : 92950636-f60a-9e51-86a4-4a5091ebed96
         vm-uuid ( RO): 942a0af4-bd7a-6b44-97fb-0dd95ca078cb
          device ( RO): 1
    network-uuid ( RO): e241ae9c-343d-ec2e-3f82-01e91387107e

uuid ( RO)            : 2cb2bb17-5b44-d2e8-48aa-4b128fb9d438
         vm-uuid ( RO): 942a0af4-bd7a-6b44-97fb-0dd95ca078cb
          device ( RO): 0
    network-uuid ( RO): b216a260-56c4-168e-acd2-f027c66c7148

And the networks:

[17:20 xen1 ~]# xe network-list uuid=e241ae9c-343d-ec2e-3f82-01e91387107e 
uuid ( RO)                : e241ae9c-343d-ec2e-3f82-01e91387107e
          name-label ( RW): SR-IOV eth5 MLNX
    name-description ( RW): 
              bridge ( RO): xapi12

[17:21 xen1 ~]# xe network-list uuid=b216a260-56c4-168e-acd2-f027c66c7148 
uuid ( RO)                : b216a260-56c4-168e-acd2-f027c66c7148
          name-label ( RW): SR-IOV eth4 MLNX
    name-description ( RW): 
              bridge ( RO): xapi11

I cannot create the SR-IOV on the bond directly, right? It should be done in the interface level.

Thanks.

Oleszkiewicz commented 1 month ago

1. You cannot bond with LACP unless you have a VERY VERY VERY VERY expensive and smart switch which would actually understand LACP over SR-IOV. Normally only one LACP link per switch port can be functional.

1. To bond you would need another bonding mode (but this is doable) even in active active mode 2. Nope, actually what you could do to get what you want is: Make one SR-IOV network for a VM (lets say on port 1), the easiest way is not to limit it to a specific VLAN install mellanox drivers on VM you will see two mellanox interfaces in the VM (both port 1 and 2) create a bond (not LACP) 3. You should be able to do the same limiting VLAN on the host side (create a VLAN network over SR-IOV network)

On ConectX 3 you cannot do SR-IOV on the bond level, the card will not "do" the bonding for you.

This card will actually require you to run software LAG either in host or VM, for SR-IOV you can do only on VM level - thus VM will see two ports and will run software LAG over them

To do it on the NIC level (and actually be able to run LACP seamlessly on SR-IOV, you need ASAP2 technology from nVidia which is available on ConnectX-5 and ConnectX-6. In this case the NIC does all the bonding, and exports a "ready" bond as a VF.


From: Vinícius Ferrão @.> Sent: Friday, 2 August 2024 22:25 To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Mention @.> Subject: Re: [xcp-ng/xcp] SR-IOV broken on 8.2.1 (Issue #544)

On this card you will always get/allocate both ports on SR-IOV, you cannot push just one port to the VM.

I'm sorry @Oleszkiewiczhttps://github.com/Oleszkiewicz but I don't get it.

What I want is exactly allocate both ports, since the idea is to have the physical links bonded with LACP. So if I understood correctly I must have two SR-IOV networks for each interface of the card, and them add two SR-IOV networks to a VM. Is my understanding correct?

However what I could observe is that when I add the distinct SR-IOV networks, the VF count is only on the first interface. I ignored the issue and tried to create the bond inside the VM, but when I do that I immediately get an error saying:

image.png (view on web)https://github.com/user-attachments/assets/214a3e55-2fc9-430b-b5a6-f6ea67fa0eec

Regarding the VM information this is what I have:

[17:19 xen1 ~]# xe vif-list vm-uuid=942a0af4-bd7a-6b44-97fb-0dd95ca078cb uuid ( RO) : 92950636-f60a-9e51-86a4-4a5091ebed96 vm-uuid ( RO): 942a0af4-bd7a-6b44-97fb-0dd95ca078cb device ( RO): 1 network-uuid ( RO): e241ae9c-343d-ec2e-3f82-01e91387107e

uuid ( RO) : 2cb2bb17-5b44-d2e8-48aa-4b128fb9d438 vm-uuid ( RO): 942a0af4-bd7a-6b44-97fb-0dd95ca078cb device ( RO): 0 network-uuid ( RO): b216a260-56c4-168e-acd2-f027c66c7148

And the networks:

[17:20 xen1 ~]# xe network-list uuid=e241ae9c-343d-ec2e-3f82-01e91387107e uuid ( RO) : e241ae9c-343d-ec2e-3f82-01e91387107e name-label ( RW): SR-IOV eth5 MLNX name-description ( RW): bridge ( RO): xapi12

[17:21 xen1 ~]# xe network-list uuid=b216a260-56c4-168e-acd2-f027c66c7148 uuid ( RO) : b216a260-56c4-168e-acd2-f027c66c7148 name-label ( RW): SR-IOV eth4 MLNX name-description ( RW): bridge ( RO): xapi11

I cannot create the SR-IOV on the bond directly, right? It should be done in the interface level.

Thanks.

— Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/544#issuecomment-2266098488, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVCHJG45G5MGMC5XABDZPPTKNAVCNFSM6AAAAABLD5NBLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRWGA4TQNBYHA. You are receiving this because you were mentioned.

stormi commented 1 month ago

Ok, I was missing the custom options line for mlx4 in /etc/modprobe.d/mlx4_core.conf that @Alphaprot mentioned:

# cat /etc/modprobe.d/mlx4_core.conf 
options mlx4_core num_vfs=16,16,0 port_type_array=2,2 probe_vf=16,16,0

After that and a reboot I was able to also use SR-IOV with the 4.9 drivers on testing repo. Just had to also use mstconfig with options set SRIOV_EN=1 NUM_OF_VFS=32 to make it work as expected. And also this setup is surviving reboots. Which is good.

Nice. So the driver we packaged does work after all :relaxed:

benjamreis commented 1 month ago

Should this issue be closed then? :thinking:

stormi commented 1 month ago

Not yet. The driver is not available yet outside the testing repository.

stormi commented 1 month ago

And we'll also need to document the known issue and solution.