Open mihadarktrace opened 3 years ago
Hi @mihadarktrace, thank you for the detailed exploration of this issue! I agree with your analysis, but calling rtnl_lock()
in the data path is out of the question (this is a global lock used used in the slow control path).
We have the infrastructure in place to protect netmap from these ring resets events, but we need to patch in some netmap calls in the right places. I'll have a look.
May you please try the following patch?
diff --git a/LINUX/default-config.mak.in_ b/LINUX/default-config.mak.in_
index 514ec80b4..9a4ce3094 100644
--- a/LINUX/default-config.mak.in_
+++ b/LINUX/default-config.mak.in_
@@ -82,7 +82,7 @@ igb@prepare := $(if $(filter $(igb@v),5.3.5.61 5.3.6 5.4.6 5.5.2),@SRCDIR@/intel
e1000e@prepare := $(if $(filter $(e1000e@v),3.8.4),@SRCDIR@/intel-fix.sh e1000e,)
ixgbevf@prepare := $(if $(filter $(ixgbevf@v),4.7.1 4.8.1 4.9.3 4.10.2 4.11.1),@SRCDIR@/intel-fix.sh ixgbevf,)
ixgbe@prepare := $(if $(filter $(ixgbe@v),5.8.1 5.9.4 5.10.2 5.11.3),@SRCDIR@/intel-fix.sh ixgbe,)
-i40e@prepare := $(if $(filter $(i40e@v),2.12.6 2.14.13 2.15.9),@SRCDIR@/intel-fix.sh i40e,)
+i40e@prepare := @SRCDIR@/i40e-redirect.sh$(if $(filter $(i40e@v),2.12.6 2.14.13 2.15.9),; @SRCDIR@/intel-fix.sh i40e,)
# some additional, driver-specific configuration
stmmac@conf := CONFIG_STMMAC_ETH
diff --git a/LINUX/i40e-redirect.sh b/LINUX/i40e-redirect.sh
new file mode 100755
index 000000000..28299a7e5
--- /dev/null
+++ b/LINUX/i40e-redirect.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+
+grep -q DEV_NETMAP i40e/i40e_main.c || exit 0
+sed -i -e 's/^void i40e_down(/static void __i40e_down(/
+ s/^static int i40e_up_complete(/static int __i40e_up_complete(/' i40e/i40e_main.c
diff --git a/LINUX/i40e_netmap_linux.h b/LINUX/i40e_netmap_linux.h
index 0f6018888..a6e2b46ca 100644
--- a/LINUX/i40e_netmap_linux.h
+++ b/LINUX/i40e_netmap_linux.h
@@ -292,6 +292,24 @@ i40e_netmap_attach(struct i40e_vsi *vsi)
netmap_attach(&na);
}
+static void __i40e_down(struct i40e_vsi *vsi);
+void i40e_down(struct i40e_vsi *vsi)
+{
+ if (vsi->netdev)
+ netmap_disable_all_rings(vsi->netdev);
+ __i40e_down(vsi);
+}
+
+static int __i40e_up_complete(struct i40e_vsi *vsi);
+static int i40e_up_complete(struct i40e_vsi *vsi)
+{
+ int rv = __i40e_up_complete(vsi);
+ if (vsi->netdev)
+ netmap_enable_all_rings(vsi->netdev);
+ return rv;
+}
+
+
#else /* NETMAP_I40E_MAIN */
Hi,
We believe we found a possible underlying cause for https://github.com/luigirizzo/netmap/issues/414. We've been having a host of issues in which the rings are inaccessible to netmap with a lot of
i40e_netmap_rxsync [r] ring eth5 RX0 is missing (rxr=00000000ab224491)
messages in the kernel logs. These only seem to happen in certain, unclear, situations, making them hard to reproduce.In one specific issue however, we managed to see through dynamic debugging that the ring missing messages are preceded by the following:
A look at the driver code suggested that this happened in the
i40e_handle_lldp
function (ini40e_main.c
) which gets called when the driver is handling message passed by the firmware through the admin queue. Looking further into the above function shows that it callsi40e_pf_quiesce_all_vsi(pf)
(supposing all previous checks are passed, which the above logging suggests they are). Thisquiesce
function deletes all rings and ring descriptors and sets them to NULL - which would explain why the netmap...sync
functions find them in such a state. Disabling the i40e firmware LLDP agent in this specific case has solved the problem and the messages about the ring missing have dissappeared. For reference, the i40e LLDP firmware agent can be disabled as described here https://advantech-ncg.zendesk.com/hc/en-us/articles/360020364512-How-to-Disable-LLDP-agent-on-XL710-in-Linux.Before calling
i40e_handle_lldp
the driver does use a lock/semaphore throughrtnl_lock()
ini40e_clean_adminq_subtask
, but from what I've been able to find that is meant to protect access to thenetdev
kernel structure as opposed to the rings themselves. A solution to the above issues might be to simply try to lock it at the beginning ofi40e_netmap_rxsync/txsync
but I do not have enough knowledge about kernel programming to say that this would be optimal or even correct and would not impact performance.There are other places in which the above
i40e_pf_quiesce_all_vsi(pf)
function is called (most of them DCB related) so another possible solution would be to compile withCONFIG_DCB
undefined (where that is possible) and removing them entirely.Another place where this seems to happen is in when the card resets, via the
set_bit(...)
andi40e_service_event_schedule
functions, but I have not looked in detail as to how/if netmap handles this specific situation.Please correct my reasoning on any of the above points if it's incorrect, my familiarity with driver code is very limited and I just recently started diving into it in more detail.