hetznercloud / bnxt_en_xdp_redirect_reproducer

Simple reproducer for an eBPF XDP redirect issue we observe with BroadCom NetExtreme NICs (bnxt_en driver).
4 stars 2 forks source link

Issue observed with Cilium L4 load balancer on BCM57416 #1

Open vills opened 4 months ago

vills commented 4 months ago

TL;DR

Hello!

Appreciate your work regarding reproducing the bug. I think i faced the same issue with the bnxt_en driver while using Cilium.

Did you report it somewhere or know a way to fix it?

Thanks.

Expected behavior

network card should work

Observed behavior

network card doesn't work as expected

Minimal working example

No response

Log output

No response

Additional information

No response

aibor commented 4 months ago

Hi,

good to know we are not the only one facing the issue. Thanks for the info. :)

So, you ran into the issue by using the Cilium layer 4 load balancer, right? Which NIC did you use exactly, if I may ask?

We have been in contact with Broadcom since September. First, they stated that XDP was not fully supported by the NIC we use, but they planned to add full XDP support with firmware release 229 (released in March 2024). But the issue was still present with this release.

They are still investigating the issue and we don't have any information about when a fix can be expected. Also, we do not have any work around and just use other vendors NICs for now.

vills commented 4 months ago

Yep. It's Cilium's L4 LB in my lab where i observe that issue. NIC info:

product: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
capabilities: pm vpd msix pciexpress bus_master cap_list rom ethernet physical tp 1000bt-fd 10000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=bnxt_en driverversion=5.15.0 duplex=full firmware=227.0.134.0/pkg 22.71.11.13 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s

I have not yet tried to reproduce issue with your code, but i will in following days. I also have other servers with Intel NICs and whey work without problems.

I think, i'll try to contact Broadcom as well. Maybe it will add some value to issue :-).

aibor commented 4 months ago

Thanks for the info.

product: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller

Interesting. So, now we know about two different ASICs that are affected: 57414 and 57416.

I have not yet tried to reproduce issue with your code, but i will in following days. I also have other servers with Intel NICs and whey work without problems.

Great, please let me know if the reproducer fails on your NIC in the same way I observed it.

I think, i'll try to contact Broadcom as well. Maybe it will add some value to issue :-).

Nice, thanks. :)

vills commented 3 months ago

We can confirm issue can be reproduced by your script with BCM57416 cards.

Tested in different firmwares. The latest available from our vendor is:

product: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller capabilities: pm vpd msix pciexpress bus_master cap_list rom ethernet physical tp 1000bt-fd 10000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=bnxt_en driverversion=6.1.0-21-amd64 duplex=full firmware=229.2.52.0/pkg 22.92.06.10 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s