Closed ecloudzbox closed 2 years ago
@ecloudzbox could you provide more information about:
@cardigliano
[root@localhost kernel]# modinfo pf_ring filename: /lib/modules/5.4.86-1.el8.elrepo.x86_64/extra/pf_ring.ko.xz alias: net-pf-27 version: 7.9.0 description: Packet capture acceleration and analysis author: ntop.org license: GPL srcversion: B7582AF3D744F71CA021D47 depends:
retpoline: Y name: pf_ring vermagic: 5.4.86-1.el8.elrepo.x86_64 SMP mod_unload modversions parm: min_num_slots:Min number of ring slots (uint) parm: perfect_rules_hash_size:Perfect rules hash size (uint) parm: enable_tx_capture:Set to 1 to capture outgoing packets (uint) parm: enable_frag_coherence:Set to 1 to handle fragments (flow coherence) in clusters (uint) parm: enable_ip_defrag:Set to 1 to enable IP defragmentation(only rx traffic is defragmentead) (uint) parm: quick_mode:Set to 1 to run at full speed but with upto one socket per interface (uint) parm: force_ring_lock:Set to 1 to force ring locking (automatically enable with rss) (uint) parm: enable_debug:Set to 1 to enable PF_RING debug tracing into the syslog, 2 for more verbosity (uint) parm: transparent_mode:(deprecated) (uint)
[root@localhost src]# pwd /root/PF_RING/drivers/intel/i40e/i40e-2.13.10-zc/src
[root@localhost src]# ./load_driver.sh Configuring ens2 IFACE CORE MASK -> FILE
ens2 0 1 -> /proc/irq/50/smp_affinity Configuring ens7f2 IFACE CORE MASK -> FILE
ens7f2 0 1 -> /proc/irq/195/smp_affinity Configuring ens7f0 IFACE CORE MASK -> FILE
ens7f0 0 1 -> /proc/irq/103/smp_affinity Configuring ens7f3 IFACE CORE MASK -> FILE
ens7f3 0 1 -> /proc/irq/241/smp_affinity Configuring ens7f1 IFACE CORE MASK -> FILE
ens7f1 0 1 -> /proc/irq/149/smp_affinity`
i40e module info:
[root@localhost ~]# modinfo i40e filename: /lib/modules/5.4.86-1.el8.elrepo.x86_64/kernel/drivers/net/ethernet/intel/i40e/i40e.ko.xz version: 2.8.20-k license: GPL v2 description: Intel(R) Ethernet Connection XL710 Network Driver author: Intel Corporation, e1000-devel@lists.sourceforge.net srcversion: B9A1D27F86384157A250744 alias: pci:v00008086d0000158Bsvsdbcsci alias: pci:v00008086d0000158Asvsdbcsci alias: pci:v00008086d00000D58svsdbcsci alias: pci:v00008086d00000CF8svsdbcsci alias: pci:v00008086d00001588svsdbcsci alias: pci:v00008086d00001587svsdbcsci alias: pci:v00008086d000037D3svsdbcsci alias: pci:v00008086d000037D2svsdbcsci alias: pci:v00008086d000037D1svsdbcsci alias: pci:v00008086d000037D0svsdbcsci alias: pci:v00008086d000037CFsvsdbcsci alias: pci:v00008086d000037CEsvsdbcsci alias: pci:v00008086d0000104Fsvsdbcsci alias: pci:v00008086d0000104Esvsdbcsci alias: pci:v00008086d000015FFsvsdbcsci alias: pci:v00008086d00001589svsdbcsci alias: pci:v00008086d00001586svsdbcsci alias: pci:v00008086d00001585svsdbcsci alias: pci:v00008086d00001584svsdbcsci alias: pci:v00008086d00001583svsdbcsci alias: pci:v00008086d00001581svsdbcsci alias: pci:v00008086d00001580svsdbcsci alias: pci:v00008086d00001574svsdbcsci alias: pci:v00008086d00001572svsdbcsci depends:
retpoline: Y intree: Y name: i40e vermagic: 5.4.86-1.el8.elrepo.x86_64 SMP mod_unload modversions parm: debug:Debug level (0=none,...,16=all), Debug mask (0x8XXXXXXX) (uint)
modprobe pktgen function pgset() { local result echo $1 > $PGDEV result=
cat $PGDEV | fgrep "Result: OK:"
if [ "$result" = "" ]; then cat $PGDEV | fgrep Result: fi } PGDEV=/proc/net/pktgen/kpktgend_0 pgset "rem_device_all"
pgset "add_device p3p4"
PGDEV=/proc/net/pktgen/p3p4 pgset "count 0"
pgset "delay 0"
pgset "clone_skb 0"
pgset "pkt_size 1500"
pgset "dst 10.0.0.171" # pgset "dst_mac 68:91:d0:66:2a:ea"
pgset "src_mac 68:91:d0:66:80:69" PGDEV=/proc/net/pktgen/pgctrl pgset "start"
Thank you !!!
Is there any update on this? I am seeing the same thing on Ubuntu 20.04.3 LTS (GNU/Linux 5.4.0-90-generic x86_64)
. Building 8.0.0-stable
from source and using zbalance_ipc
is producing the same error as above.
@sippejw since we are not able to reproduce (and thus debug) this, could you provide detailed instructions to reproduce it or even better access to the system? Thank you.
To follow up here. We have experienced this issue on ubuntu 18.04 and 20.04 specifically for network cards using the intel XL710 qsfp+ network driver. It seems to be caused by the i40e driver and happens only for 7.8.0-stable and later (including the dev branch as of 5cc19525b97142e0147fdb930b59f84770fb4e51) -- drivers/intel/i40e/i40e-2.4.6-zc
doesn't seem to have this issue.
$ lspci | egrep -i --color 'network|ethernet'
1b:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)
1b:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for 1GbE (rev 09)
b3:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
b3:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)
Since we are installing from source we are only installing the kernel module and the i40e zc driver module.
$ insmod pf_ring.ko min_num_slots=65536
$ modinfo pf_ring
filename: /lib/modules/5.4.0-91-generic/kernel/net/pf_ring/pf_ring.ko
alias: net-pf-27
version: 8.1.0
description: Packet capture acceleration and analysis
author: ntop.org
license: GPL
srcversion: 49928A30A20087E50DEE717
depends:
retpoline: Y
name: pf_ring
vermagic: 5.4.0-91-generic SMP mod_unload modversions
parm: min_num_slots:Min number of ring slots (uint)
parm: perfect_rules_hash_size:Perfect rules hash size (uint)
parm: enable_tx_capture:Set to 1 to capture outgoing packets (uint)
parm: enable_frag_coherence:Set to 1 to handle fragments (flow coherence) in clusters (uint)
parm: enable_ip_defrag:Set to 1 to enable IP defragmentation(only rx traffic is defragmentead) (uint)
parm: quick_mode:Set to 1 to run at full speed but with upto one socket per interface (uint)
parm: force_ring_lock:Set to 1 to force ring locking (automatically enable with rss) (uint)
parm: enable_debug:Set to 1 to enable PF_RING debug tracing into the syslog, 2 for more verbosity (uint)
parm: transparent_mode:(deprecated) (uint)
the error occurs when a device attempts to use the i40e driver after it is inserted.
# watch dmesg for BUGs
sudo dmesg -wH
## Separate terminal
# insert module
cd PF_RING/drivers/intel/i40e/i40e-2.4.6-zc/src/
sudo ./load_driver.sh
sudo zcount -i zc:ens2f0
dmesg should show a cascade of page errors - here it blames sshd because our management interfaces also use i40e which start using the i40e driver as soon as it is inserted.
[ +0.000001] BUG: Bad page state in process sshd pfn:bf5668
[ +0.000314] page:ffffb8cfafd59a00 refcount:65533 mapcount:0 mapping:0000000000000000 index:0x0
[ +0.000000] flags: 0x17ffffc0000000()
[ +0.000001] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[ +0.000001] raw: 0000000000000000 0000000000000000 0000fffdffffffff 0000000000000000
[ +0.000000] page dumped because: nonzero _refcount
[ +0.000000] Modules linked in: i40e(OE) pf_ring(OE) ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid vxlan ip6_udp_tunnel udp_tunnel bonding dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rapl intel_cstate mei_me mei joydev input_leds ioatdma dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel msr ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea sysfillrect crc32_pclmul sysimgblt ghash_clmulni_intel hid_generic fb_sys_fops aesni_intel crypto_simd usbhid cryptd drm hid glue_helper i2c_i801 lpc_ich ahci libahci wmi [last unloaded: pf_ring]
[ +0.000041] CPU: 6 PID: 31998 Comm: sshd Tainted: G B OE 5.4.0-90-generic #101-Ubuntu
[ +0.000001] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 2.2 11/01/2018
[ +0.000000] Call Trace:
[ +0.000009] dump_stack+0x6d/0x8b
[ +0.000001] bad_page.cold+0x80/0xb1
[ +0.000001] check_new_page_bad+0x67/0x80
[ +0.000002] rmqueue+0x72e/0xf00
[ +0.000007] get_page_from_freelist+0xb8/0x3f0
[ +0.000001] __alloc_pages_nodemask+0x173/0x320
[ +0.000001] alloc_pages_current+0x87/0xe0
[ +0.000002] skb_page_frag_refill+0x80/0x110
[ +0.000007] sk_page_frag_refill+0x21/0x80
[ +0.000001] tcp_sendmsg_locked+0x2c9/0xde0
[ +0.000002] tcp_sendmsg+0x2d/0x50
[ +0.000007] inet_sendmsg+0x43/0x70
[ +0.000001] sock_sendmsg+0x5e/0x70
[ +0.000002] sock_write_iter+0x93/0xf0
[ +0.000008] new_sync_write+0x125/0x1c0
[ +0.000001] __vfs_write+0x29/0x40
[ +0.000002] vfs_write+0xb9/0x1a0
[ +0.000000] ksys_write+0x67/0xe0
[ +0.000008] __x64_sys_write+0x1a/0x20
[ +0.000001] do_syscall_64+0x57/0x190
[ +0.000001] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000000] RIP: 0033:0x7f1f4332a1e7
[ +0.000001] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ +0.000001] RSP: 002b:00007ffc78e24a28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ +0.000007] RAX: ffffffffffffffda RBX: 00000000000002f4 RCX: 00007f1f4332a1e7
[ +0.000000] RDX: 00000000000002f4 RSI: 00005575f1da8ab0 RDI: 0000000000000004
[ +0.000001] RBP: 00005575f1daf9c0 R08: 00007ffc78fc70f0 R09: 00007ffc78e249b8
[ +0.000000] R10: 00007ffc78e249b0 R11: 0000000000000246 R12: 0000000000000000
[ +0.000001] R13: 00005575f064a868 R14: 0000000000000004 R15: 0000000000000004
This may be related to https://github.com/ntop/PF_RING/issues/774
The solution to #774 has fixed this issue for us and we are not longer seeing page errors.
Thank you for the update. Let's close this.
CentOS 8.0 kernel 5.4.86-1.el8.elrepo.x86_64 pfring 7.9.0 (make install)
Use PKTGEN to send the package on the other end, and the system will die in about an hour
error log: