ntop / PF_RING

High-speed packet processing framework
http://www.ntop.org
GNU Lesser General Public License v2.1
2.64k stars 351 forks source link

Activating ice ZC Driver disables all i40e interfaces on RHEL 8.8 #917

Closed Arislen closed 4 months ago

Arislen commented 4 months ago

OS: Red Hat Enterprise Linux release 8.8 (Ootpa) Kernel: Linux 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Thu Aug 31 10:29:22 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

System has the most up2date RHEL patches.

NTOP Versions Installed (from: https://packages.ntop.org/ Nightly Build Repository)

pfring-8.7.0-8936.x86_64 pfring-dkms-8.7.0.8936-dkms.noarch n2disk-3.7.240226-5500.x86_64 ice-zc-1.12.7.8936-dkms.noarch

The i40e driver is NOT installed as we aren't going to use any of those interfaces for capture.

pf_ringcfg --list-interfaces
Name: eno12399             Driver: i40e       RSS: Unknown  [Supported by ZC]
Name: eno12409             Driver: i40e       RSS: Unknown  [Supported by ZC]
Name: eno12419             Driver: i40e       RSS: Unknown  [Supported by ZC]
Name: eno12429             Driver: i40e       RSS: Unknown  [Supported by ZC]
Name: ens3f0               Driver: ice        RSS: Unknown  [Supported by ZC]
Name: ens3f1               Driver: ice        RSS: Unknown  [Supported by ZC]
Name: ens6f0               Driver: ice        RSS: Unknown  [Supported by ZC]
Name: ens6f1               Driver: ice        RSS: Unknown  [Supported by ZC]

Our capture interface is ens3f1. Our management interface is eno12399

Before we run pf_ringcfg:

ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
14: ens3f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 40:a6:b7:b0:48:60 brd ff:ff:ff:ff:ff:ff
    altname enp55s0f0
15: ens3f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 40:a6:b7:b0:48:61 brd ff:ff:ff:ff:ff:ff
    altname enp55s0f1
16: ens6f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 40:a6:b7:b0:48:50 brd ff:ff:ff:ff:ff:ff
    altname enp139s0f0
17: ens6f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 40:a6:b7:b0:48:51 brd ff:ff:ff:ff:ff:ff
    altname enp139s0f1
18: eno12399: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b4:83:51:11:58:b4 brd ff:ff:ff:ff:ff:ff
    altname enp34s0f0
    inet 172.20.3.28/24 brd 172.20.3.255 scope global noprefixroute eno12399
       valid_lft forever preferred_lft forever
    inet6 fe80::b683:51ff:fe11:58b4/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
19: eno12409: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether b4:83:51:11:58:b5 brd ff:ff:ff:ff:ff:ff
    altname enp34s0f1
20: eno12419: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether b4:83:51:11:58:b6 brd ff:ff:ff:ff:ff:ff
    altname enp34s0f2
21: eno12429: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether b4:83:51:11:58:b7 brd ff:ff:ff:ff:ff:ff
    altname enp34s0f3

/etc/pf_ring/interfaces.conf:

MANAGEMENT_INTERFACES="eno12399"
CAPTURE_INTERFACES="ens3f1"

To enable ice ZC driver we run:

pf_ringcfg --configure-driver ice --rss-queues 1

image

The management interface disappears and we can no longer ssh into the system:

image

image

When we run:

systemctl stop pf_ring

The interfaces reappear and we can then ssh back into the system.

Arislen commented 4 months ago

Adding journalctl output of the pf_ringcfg command that caused the removal of the management interface:

ntop-journal-logs.txt

pastly commented 4 months ago

The issue seems to be that we have a card in the host that uses i40e and a card that uses ice. When /usr/bin/pf_ringctl loads the zc driver for ice, it unloads irdma first, since the non-zc version of ice uses it. I guess that's necessary?

But anyway, the issue is that i40e also depends on irdma, like ice does. So when irdma is unloaded, it brings down the management interface that needs the i40e driver.

I don't pretend to understand why the following works, but it does. After starting pfring (thus losing the mgmt iface), we can manually modprobe i40e and get the mgmt iface back. What's confusing is lsmod | grep i40e no longer lists irdma as a dependency of i40e like it used to, but whatever.

cardigliano commented 4 months ago

@pastly I think this is due to modprobe -r which is also unloading dependencies, I pushed an update using rmmod instead, a new package will also be available shortly. Please update and let me know if it fixed the issue. Thank you.

Arislen commented 4 months ago

The patch fixed the issue as it removed the irdma kernel mod and kept the i40e interfaces up.

In the general case if a user needs irdma on the i40e (we don't as we run it at 1G with no need for the rdma function) this will remove that capability and likely would need another workaround.

cardigliano commented 4 months ago

The problem is that the irdma prevents the ice driver from being reloaded. We can reload the irdma module or let the user handle that in the pfring "post" script.