Kernel panic in two Linux boxes at the same time (skb_ring_handler)

jkingcis commented 6 years ago

Hello,

After using PF_RING for some years, I'm facing some strange issue. Short story is:

Both of my Dell PowerEdge R530s running CentOS 7+ PF_RING + Suricata have been rebooted automatically after getting kernel panic at the same time (timing was very accurate, just confirmed on the network switch that both boxes rebooted at the exact same timestamp).
Both are running same CentOS 7 minimal install with pf_ring module loaded.
One server is running PF_RING 6.6.0, the other is running 7.0.0
Servers capture traffic with two 10G fiber cards and both have 4-port ethernet card for managent
Both servers receive the same traffic via SPAN swith, and then apply different Suricata ruleset each.
This has happened twice (first time in June, second one a few days ago (August)). Both times, both servers rebooted at the very same second. No system/app updates were done before or after the first or second crashes.
Information in the dmesg, messages or suricata.log was not relevant, but as Kdump was configured, I was able to retrieve both kernel dumps and analyze them.
It seems to me some kind of bug in skb_ring_handler function included in pf_ring.c as both kernel dumps reveal they both crashed while in skb_ring_handler after a packet_rcv.
I have two 2,5GB dumps of information that I could share if necessary. Won't upload them for being to heavy.
No power nor network outages occured, and network ports didn't shutdown. I've tested bug #288 just in case it was the same bug, but it seems it's not, as servers do not hang/crash when network interfaces are shutdown or disconnected.
No CRC errors are detected in any card.
iDRAC only shows CPU reset after both system rebooted themselves.

Host A:

Dell PowerEdge R530 (56 vCPU + 64 GB RAM), BIOS 2.3.4 11/08/2016
CentOS Linux release 7.3.1611 (Core)
Kernel 3.10.0-514.26.2.el7.x86_64
Suricata 4.0.3
PF_RING 6.6.0
Glibc glibc-2.17-157.el7_3.4.x86_64

Host B:

Dell PowerEdge R530 (56 vCPU + 64 GB RAM), BIOS 2.3.4 11/08/2016
CentOS Linux release 7.4.1708 (Core)
Kernel 3.10.0-693.5.2.el7.x86_64
Suricata 4.0.3
PF_RING 7.0.0
Glibc glibc-2.17-196.el7.x86_64

Kernel dump of Host A:

crash /usr/lib/debug/lib/modules/3.10.0-514.26.2.el7.x86_64/vmlinux /var/crash/127.0.0.1-2018-08-15-06\:14\:10/vmcore

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.26.2.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2018-08-15-06:14:10/vmcore  [PARTIAL DUMP]
        CPUS: 56
        DATE: Wed Aug 15 06:13:59 2018
      UPTIME: 73 days, 01:08:06
LOAD AVERAGE: 0.04, 0.12, 0.18
       TASKS: 685
    NODENAME: hostA.example.com
     RELEASE: 3.10.0-514.26.2.el7.x86_64
     VERSION: #1 SMP Tue Jul 4 15:04:05 UTC 2017
     MACHINE: x86_64  (1997 Mhz)
      MEMORY: 63.9 GB
       PANIC: "general protection fault: 0000 [#1] SMP "
         PID: 0
     COMMAND: "swapper/28"
        TASK: ffff880174ae2f10  (1 of 56)  [THREAD_INFO: ffff880174b0c000]
         CPU: 28
       STATE: TASK_RUNNING (PANIC)

crash> log
(...)
[6310747.517058] general protection fault: 0000 [#1] SMP
[6310747.517090] Modules linked in: cdc_ether usbnet mii vfat fat uas usb_storage ipmi_si dell_rbu nfnetlink_queue nfnetlink_log vxlan ip6_udp_tunnel udp_tunnel uio pf_ring(OE) binfmt_misc ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support ipmi_ssif ipmi_devintf
[6310747.517493]  mxm_wmi dcdbas sg pcspkr sb_edac edac_core lpc_ich mei_me ipmi_msghandler mei acpi_power_meter shpchp tpm_crb wmi ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ahci libahci dca i40e i2c_algo_bit tg3 libata megaraid_sas ptp i2c_core pps_core fjes dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_si]
[6310747.517742] CPU: 28 PID: 0 Comm: swapper/28 Tainted: G        W  OE  ------------   3.10.0-514.26.2.el7.x86_64 #1
[6310747.517779] Hardware name: Dell Inc. PowerEdge R530/0CN7X8, BIOS 2.3.4 11/08/2016
[6310747.517806] task: ffff880174ae2f10 ti: ffff880174b0c000 task.ti: ffff880174b0c000
[6310747.517832] RIP: 0010:[<ffffffffa07c0de9>]  [<ffffffffa07c0de9>] skb_ring_handler+0x979/0x1b80 [pf_ring]
[6310747.517874] RSP: 0018:ffff88085e983a80  EFLAGS: 00010202
[6310747.517895] RAX: ffffea000d00b5c0 RBX: ffff88080bb8cc00 RCX: 0000000000000001
[6310747.517920] RDX: 00000000fffffff8 RSI: 002fffff0000082c RDI: 000000000000000b
[6310747.517945] RBP: ffff88085e983c98 R08: 000000000000000a R09: 000000000000000a
[6310747.517971] R10: ffff88080bb8cc00 R11: 00000000ffffffff R12: 0000000080000000
[6310747.517997] R13: 0000000000000001 R14: ffff88085572c800 R15: 0000000000000001
[6310747.518022] FS:  0000000000000000(0000) GS:ffff88085e980000(0000) knlGS:0000000000000000
[6310747.518051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[6310747.518072] CR2: 00007fe7013acae8 CR3: 00000000019be000 CR4: 00000000003407e0
[6310747.518097] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[6310747.518122] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[6310747.518147] Stack:
[6310747.518158]  ffff880200000001 ffff88085e983b9c ffff88085e983b18 ffff88085e983ad8
[6310747.518225]  ffff88017fc07700 ffff88017fc00140 0000000000000000 ffff88085e983b40
[6310747.518260]  ffffffff811db497 ffff88085e999b18 ffff88087ffda068 ffff88085e983b48
[6310747.518296] Call Trace:
[6310747.518308]  <IRQ>
[6310747.518318]
[6310747.518337]  [<ffffffff811db497>] ? unfreeze_partials.isra.43+0xe7/0x130
[6310747.518359]  [<ffffffff8118852a>] ? __rmqueue+0x8a/0x460
[6310747.518382]  [<ffffffff811dcab1>] ? __slab_free+0x81/0x2f0
[6310747.518422]  [<ffffffffa07c2085>] packet_rcv+0x95/0xb0 [pf_ring]
[6310747.518467]  [<ffffffff8156feb8>] __netif_receive_skb_core+0x2d8/0x800
[6310747.518525]  [<ffffffff815703f8>] __netif_receive_skb+0x18/0x60
[6310747.518547]  [<ffffffff81570480>] netif_receive_skb_internal+0x40/0xc0
[6310747.518573]  [<ffffffff81571598>] napi_gro_receive+0xd8/0x130
[6310747.519914]  [<ffffffffa018e2aa>] i40e_napi_poll+0x53a/0xb70 [i40e]
[6310747.521304]  [<ffffffff81570c20>] net_rx_action+0x170/0x380
[6310747.522231]  [<ffffffff8108f63f>] __do_softirq+0xef/0x280
[6310747.523181]  [<ffffffff8169929c>] call_softirq+0x1c/0x30
[6310747.524048]  [<ffffffff8102d365>] do_softirq+0x65/0xa0
[6310747.524915]  [<ffffffff8108f9d5>] irq_exit+0x115/0x120
[6310747.525746]  [<ffffffff81699e38>] do_IRQ+0x58/0xf0
[6310747.526583]  [<ffffffff8168eeed>] common_interrupt+0x6d/0x6d
[6310747.527473]  <EOI>
[6310747.527481]
[6310747.528478]  [<ffffffff81514eb2>] ? cpuidle_enter_state+0x52/0xc0
[6310747.529325]  [<ffffffff81514ff9>] cpuidle_idle_call+0xd9/0x210
[6310747.530154]  [<ffffffff8103512e>] arch_cpu_idle+0xe/0x30
[6310747.530919]  [<ffffffff810e82f5>] cpu_startup_entry+0x245/0x290
[6310747.531759]  [<ffffffff8104f0da>] start_secondary+0x1ba/0x230
[6310747.532557] Code: 83 c7 02 0f 1f 80 00 00 00 00 48 63 c2 48 8b 44 c3 10 48 85 c0 0f 84 b7 01 00 00 48 8b b0 00 03 00 00 48 85 f6 0f 84 a7 01 00 00 <48> 83 be 80 04 00 00 00 0f 84 99 01 00 00 49 8b 46 20 8b 80 c0
[6310747.534363] RIP  [<ffffffffa07c0de9>] skb_ring_handler+0x979/0x1b80 [pf_ring]
[6310747.535142]  RSP <ffff88085e983a80>

I can see the assembly command that was running:

crash> dis -l (skb_ring_handler+0x979)
0xffffffffa07c0de9 <skb_ring_handler+2425>:     cmpq   $0x0,0x480(%rsi)

I also see the kernel seems tainted G, which means: 4096 - An out-of-tree module has been loaded. 8192 - An unsigned module has been loaded in a kernel supporting module signature.

[root@hostA ~]# cat /proc/sys/kernel/tainted
12288

The backtrace where I see again the RIP and the packet_rcv just before the crash

crash> bt
PID: 0      TASK: ffff880174ae2f10  CPU: 28  COMMAND: "swapper/28"
 #0 [ffff88085e983800] machine_kexec at ffffffff81059beb
 #1 [ffff88085e983860] __crash_kexec at ffffffff81105822
 #2 [ffff88085e983930] crash_kexec at ffffffff81105910
 #3 [ffff88085e983948] oops_end at ffffffff81690008
 #4 [ffff88085e983970] die at ffffffff8102e93b
 #5 [ffff88085e9839a0] do_general_protection at ffffffff8168f8fe
 #6 [ffff88085e9839d0] general_protection at ffffffff8168f1a8
    [exception RIP: skb_ring_handler+2425]
    RIP: ffffffffa07c0de9  RSP: ffff88085e983a80  RFLAGS: 00010202
    RAX: ffffea000d00b5c0  RBX: ffff88080bb8cc00  RCX: 0000000000000001
    RDX: 00000000fffffff8  RSI: 002fffff0000082c  RDI: 000000000000000b
    RBP: ffff88085e983c98   R8: 000000000000000a   R9: 000000000000000a
    R10: ffff88080bb8cc00  R11: 00000000ffffffff  R12: 0000000080000000
    R13: 0000000000000001  R14: ffff88085572c800  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88085e983ca0] packet_rcv at ffffffffa07c2085 [pf_ring]
 #8 [ffff88085e983cd0] __netif_receive_skb_core at ffffffff8156feb8
 #9 [ffff88085e983d38] __netif_receive_skb at ffffffff815703f8
#10 [ffff88085e983d58] netif_receive_skb_internal at ffffffff81570480
#11 [ffff88085e983d88] napi_gro_receive at ffffffff81571598
#12 [ffff88085e983db0] i40e_napi_poll at ffffffffa018e2aa [i40e]
#13 [ffff88085e983e38] net_rx_action at ffffffff81570c20
#14 [ffff88085e983eb8] __do_softirq at ffffffff8108f63f
#15 [ffff88085e983f28] call_softirq at ffffffff8169929c
#16 [ffff88085e983f40] do_softirq at ffffffff8102d365
#17 [ffff88085e983f60] irq_exit at ffffffff8108f9d5
#18 [ffff88085e983f78] do_IRQ at ffffffff81699e38
--- <IRQ stack> ---
#19 [ffff880174b0fda8] ret_from_intr at ffffffff8168eeed
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81514eb2  RSP: ffff880174b0fe50  RFLAGS: 00000206
    RAX: 00166c14c85aec87  RBX: 000000000000f8a0  RCX: 0000000000000017
    RDX: 0000000225c17d03  RSI: ffff880174b0ffd8  RDI: 00166c14c85aec87
    RBP: ffff880174b0fe78   R8: 0000000000005f0f   R9: 0000000000000018
    R10: 00000000000005a0  R11: 0000000000000000  R12: ffff880174b0fe20
    R13: ffff88085e98f8e0  R14: 0000000000000082  R15: ffff88085e98f8e0
    ORIG_RAX: ffffffffffffffb5  CS: 0010  SS: 0018
#20 [ffff880174b0fe80] cpuidle_idle_call at ffffffff81514ff9
#21 [ffff880174b0fec0] arch_cpu_idle at ffffffff8103512e
#22 [ffff880174b0fed0] cpu_startup_entry at ffffffff810e82f5
#23 [ffff880174b0ff28] start_secondary at ffffffff8104f0da

This portion of code is found on pfring.c here:

[root@hostA pfring-6.6.0]# fgrep -rin skb_ring_handler .

./pf_ring.c:342:static int skb_ring_handler(struct sk_buff *skb, u_char recv_packet,
./pf_ring.c:3799:static int skb_ring_handler(struct sk_buff *skb,
./pf_ring.c:4094:  return(skb_ring_handler(&skb, 1, 0 /* fake skb */,
./pf_ring.c:4114:    rc = skb_ring_handler(skb,
./pf_ring.c:7734:  .ring_handler = skb_ring_handler,

The PF_RING module version loaded is:

[root@hostA src]# modinfo pf_ring

filename:       /lib/modules/3.10.0-514.26.2.el7.x86_64/extra/pf_ring.ko
alias:          net-pf-27
version:        6.6.0
description:    Packet capture acceleration and analysis
author:         ntop.org
license:        GPL
rhelversion:    7.3
srcversion:     414F094C8FD5E8D55A89517
depends:
vermagic:       3.10.0-514.26.2.el7.x86_64 SMP mod_unload modversions
parm:           min_num_slots:Min number of ring slots (uint)
parm:           perfect_rules_hash_size:Perfect rules hash size (uint)
parm:           enable_tx_capture:Set to 1 to capture outgoing packets (uint)
parm:           enable_frag_coherence:Set to 1 to handle fragments (flow coherence) in clusters (uint)
parm:           enable_ip_defrag:Set to 1 to enable IP defragmentation(only rx traffic is defragmentead) (uint)
parm:           quick_mode:Set to 1 to run at full speed but with upto one socket per interface (uint)
parm:           force_ring_lock:Set to 1 to force ring locking (automatically enable with rss) (uint)
parm:           enable_debug:Set to 1 to enable PF_RING debug tracing into the syslog, 2 for more verbosity (uint)
parm:           transparent_mode:(deprecated) (uint)

Kernel dump of Host B:

crash /usr/lib/debug/lib/modules/3.10.0-693.5.2.el7.x86_64/vmlinux /var/crash/127.0.0.1-2018-08-15-06\:12\:01/vmcore

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-693.5.2.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2018-08-15-06:12:01/vmcore  [PARTIAL DUMP]
        CPUS: 56
        DATE: Wed Aug 15 06:13:41 2018
      UPTIME: 73 days, 01:07:44
LOAD AVERAGE: 0.20, 0.06, 0.06
       TASKS: 873
    NODENAME: hostB.example.com
     RELEASE: 3.10.0-693.5.2.el7.x86_64
     VERSION: #1 SMP Fri Oct 20 20:32:50 UTC 2017
     MACHINE: x86_64  (1997 Mhz)
      MEMORY: 63.9 GB
       PANIC: "BUG: unable to handle kernel paging request at 00000001e400e300"
         PID: 0
     COMMAND: "swapper/24"
        TASK: ffff880174542f70  (1 of 56)  [THREAD_INFO: ffff88017456c000]
         CPU: 24
       STATE: TASK_RUNNING (PANIC)

The PANIC indicates that the code was attempting to address 00000001e400e300 but the system wouldn't allow it, as it seems there's an invalid pointer in the code.

crash> log 
(...)
[6311512.009555] BUG: unable to handle kernel paging request at 00000001e400e300
[6311512.009592] IP: [<ffffffffc073ff22>] skb_ring_handler+0x9e2/0x1d10 [pf_ring]
[6311512.009623] PGD 0
[6311512.009633] Oops: 0000 [#1] SMP
[6311512.009663] Modules linked in: cdc_ether usbnet mii joydev vfat fat uas usb_storage ipmi_si dell_rbu binfmt_misc vxlan ip6_udp_tunnel udp_tunnel uio pf_ring(OE) ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas
[6311512.010023]  ipmi_ssif mei_me pcspkr mei sg lpc_ich ipmi_devintf ipmi_msghandler shpchp wmi acpi_power_meter tpm_crb ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common i40e crc32c_intel ahci libahci igb tg3 libata megaraid_sas dca i2c_algo_bit ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_si]
[6311512.010203] CPU: 24 PID: 0 Comm: swapper/24 Tainted: G           OE  ------------   3.10.0-693.5.2.el7.x86_64 #1
[6311512.010236] Hardware name: Dell Inc. PowerEdge R530/0CN7X8, BIOS 2.3.4 11/08/2016
[6311512.010260] task: ffff880174542f70 ti: ffff88017456c000 task.ti: ffff88017456c000
[6311512.010285] RIP: 0010:[<ffffffffc073ff22>]  [<ffffffffc073ff22>] skb_ring_handler+0x9e2/0x1d10 [pf_ring]
[6311512.010319] RSP: 0018:ffff88085e903a08  EFLAGS: 00010206
[6311512.010337] RAX: fffffffffffffff8 RBX: 0000000080000000 RCX: 000000000000000a
[6311512.010394] RDX: 00000000fffffff8 RSI: 0000000000000001 RDI: 000000000000000b
[6311512.010435] RBP: ffff88085e903c48 R08: 00000001e400e000 R09: 000000000000a7ca
[6311512.010459] R10: 000000000000a7b3 R11: 0000000000000006 R12: ffff880aa8663400
[6311512.010482] R13: ffff880a53ed9000 R14: 000000000000000a R15: 0000000000000001
[6311512.010506] FS:  0000000000000000(0000) GS:ffff88085e900000(0000) knlGS:0000000000000000
[6311512.010532] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[6311512.010551] CR2: 00000001e400e300 CR3: 00000000019f2000 CR4: 00000000003407e0
[6311512.010575] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[6311512.010613] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[6311512.010636] Stack:
[6311512.010645]  ffff880800000001 ffff88085e903b0c 0000000000000001 000000000000009c
[6311512.010671]  000000000000009c ffff88017456fdb8 ffff88085e903a68 ffffffff81130c15
[6311512.010715]  0000000034e60980 9d0be71dfcc3e6f6 ffff88085856e100 ffff88085856e184
[6311512.010743] Call Trace:
[6311512.010767]  <IRQ>
[6311512.010777]
[6311512.010787]  [<ffffffff81130c15>] ? handle_irq_event_percpu+0x55/0x80
[6311512.010805]  [<ffffffff81130c87>] ? handle_irq_event+0x47/0x60
[6311512.010827]  [<ffffffff81133907>] ? handle_edge_irq+0x77/0x130
[6311512.010850]  [<ffffffff816b7601>] ? stub32_clone+0x1/0x10
[6311512.010870]  [<ffffffff816ac2ad>] ? common_interrupt+0x6d/0x6d
[6311512.010908]  [<ffffffffc07412e5>] packet_rcv+0x95/0xb0 [pf_ring]
[6311512.011860]  [<ffffffff81586c88>] __netif_receive_skb_core+0x2d8/0x7c0
[6311512.012841]  [<ffffffff810bd514>] ? __wake_up+0x44/0x50
[6311512.013857]  [<ffffffff81587188>] __netif_receive_skb+0x18/0x60
[6311512.014685]  [<ffffffff81587210>] netif_receive_skb_internal+0x40/0xc0
[6311512.015507]  [<ffffffff81588318>] napi_gro_receive+0xd8/0x130
[6311512.016459]  [<ffffffffc02a8693>] i40e_clean_rx_irq+0x3e3/0x960 [i40e]
[6311512.017390]  [<ffffffffc02a8ed5>] i40e_napi_poll+0x2c5/0x700 [i40e]
[6311512.018222]  [<ffffffff8158799d>] net_rx_action+0x16d/0x380
[6311512.019107]  [<ffffffff81090b4f>] __do_softirq+0xef/0x280
[6311512.019888]  [<ffffffff816b6b1c>] call_softirq+0x1c/0x30
[6311512.020917]  [<ffffffff8102d3c5>] do_softirq+0x65/0xa0
[6311512.021720]  [<ffffffff81090ed5>] irq_exit+0x105/0x110
[6311512.022478]  [<ffffffff816b76b6>] do_IRQ+0x56/0xe0
[6311512.023212]  [<ffffffff816ac2ad>] common_interrupt+0x6d/0x6d
[6311512.023890]  <EOI>
[6311512.023897]
[6311512.024670]  [<ffffffff81527ad2>] ? cpuidle_enter_state+0x52/0xc0
[6311512.025418]  [<ffffffff81527c18>] cpuidle_idle_call+0xd8/0x210
[6311512.026232]  [<ffffffff81034fee>] arch_cpu_idle+0xe/0x30
[6311512.027025]  [<ffffffff810e7bda>] cpu_startup_entry+0x14a/0x1c0
[6311512.027661]  [<ffffffff81051af6>] start_secondary+0x1b6/0x230
[6311512.028358] Code: 00 41 83 7c 24 04 01 0f 85 ac 01 00 00 8d 42 01 83 c6 01 99 f7 f9 39 fe 0f 84 9b 01 00 00 48 63 c2 4d 8b 44 c4 10 4d 85 c0 74 d6 <49> 8b 80 00 03 00 00 48 85 c0 74 ca 4d 8b 4d 20 4d 8b 99 e8 03
[6311512.029698] RIP  [<ffffffffc073ff22>] skb_ring_handler+0x9e2/0x1d10 [pf_ring]
[6311512.030368]  RSP <ffff88085e903a08>
[6311512.031044] CR2: 00000001e400e300

I can see the assembly command that was running:

crash> dis -l (skb_ring_handler+0x9e2)
0xffffffffc073ff22 <skb_ring_handler+2530>:     mov    0x300(%r8),%rax

I also see the kernel seems tainted G, which means: 4096 - An out-of-tree module has been loaded. 8192 - An unsigned module has been loaded in a kernel supporting module signature.

[root@hostB ~]# cat /proc/sys/kernel/tainted
12288

The backtrace where I see again the RIP and the packet_rcv just before the crash:

crash> bt
PID: 0      TASK: ffff880174542f70  CPU: 24  COMMAND: "swapper/24"
 #0 [ffff88085e9036a8] machine_kexec at ffffffff8105c4cb
 #1 [ffff88085e903708] __crash_kexec at ffffffff81104a42
 #2 [ffff88085e9037d8] crash_kexec at ffffffff81104b30
 #3 [ffff88085e9037f0] oops_end at ffffffff816ad338
 #4 [ffff88085e903818] no_context at ffffffff8169d35a
 #5 [ffff88085e903868] __bad_area_nosemaphore at ffffffff8169d3f0
 #6 [ffff88085e9038b0] bad_area_nosemaphore at ffffffff8169d55a
 #7 [ffff88085e9038c0] __do_page_fault at ffffffff816b01fe
 #8 [ffff88085e903920] do_page_fault at ffffffff816b03a5
 #9 [ffff88085e903950] page_fault at ffffffff816ac5c8
    [exception RIP: skb_ring_handler+2530]
    RIP: ffffffffc073ff22  RSP: ffff88085e903a08  RFLAGS: 00010206
    RAX: fffffffffffffff8  RBX: 0000000080000000  RCX: 000000000000000a
    RDX: 00000000fffffff8  RSI: 0000000000000001  RDI: 000000000000000b
    RBP: ffff88085e903c48   R8: 00000001e400e000   R9: 000000000000a7ca
    R10: 000000000000a7b3  R11: 0000000000000006  R12: ffff880aa8663400
    R13: ffff880a53ed9000  R14: 000000000000000a  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff88085e903c50] packet_rcv at ffffffffc07412e5 [pf_ring]
#11 [ffff88085e903c80] __netif_receive_skb_core at ffffffff81586c88
#12 [ffff88085e903cf0] __netif_receive_skb at ffffffff81587188
#13 [ffff88085e903d10] netif_receive_skb_internal at ffffffff81587210
#14 [ffff88085e903d40] napi_gro_receive at ffffffff81588318
#15 [ffff88085e903d68] i40e_clean_rx_irq at ffffffffc02a8693 [i40e]
#16 [ffff88085e903dd0] i40e_napi_poll at ffffffffc02a8ed5 [i40e]
#17 [ffff88085e903e38] net_rx_action at ffffffff8158799d
#18 [ffff88085e903eb8] __do_softirq at ffffffff81090b4f
#19 [ffff88085e903f28] call_softirq at ffffffff816b6b1c
#20 [ffff88085e903f40] do_softirq at ffffffff8102d3c5
#21 [ffff88085e903f60] irq_exit at ffffffff81090ed5
#22 [ffff88085e903f78] do_IRQ at ffffffff816b76b6
--- <IRQ stack> ---
#23 [ffff88017456fdb8] ret_from_intr at ffffffff816ac2ad
    [exception RIP: cpuidle_enter_state+82]
    RIP: ffffffff81527ad2  RSP: ffff88017456fe60  RFLAGS: 00000206
    RAX: 00166c0fc5def9ee  RBX: ffff88017456fe40  RCX: 0000000000000017
    RDX: 0000000225c17d03  RSI: ffff88017456ffd8  RDI: 00166c0fc5def9ee
    RBP: ffff88017456fe88   R8: 0000000000014f9c   R9: 0000000000000018
    R10: 000000000000044b  R11: 00166c738878f400  R12: 0000000000000018
    R13: ffff88085e90f960  R14: ffffffff810b4165  R15: ffff88017456fde0
    ORIG_RAX: ffffffffffffff15  CS: 0010  SS: 0018
#24 [ffff88017456fe90] cpuidle_idle_call at ffffffff81527c18
#25 [ffff88017456fed0] arch_cpu_idle at ffffffff81034fee
#26 [ffff88017456fee0] cpu_startup_entry at ffffffff810e7bda
#27 [ffff88017456ff28] start_secondary at ffffffff81051af6

This portion of code is found on pfring.c here:

[root@hostB pfring-7.0.0]# fgrep -rin skb_ring_handler .

./pf_ring.c:337:static int skb_ring_handler(struct sk_buff *skb, u_char recv_packet,
./pf_ring.c:3725:static int skb_ring_handler(struct sk_buff *skb,
./pf_ring.c:4020:  return(skb_ring_handler(&skb, 1, 0 /* fake skb */,
./pf_ring.c:4036:    rc = skb_ring_handler(skb,
./pf_ring.c:7545:  .ring_handler = skb_ring_handler,

The PF_RING module version loaded is:

[root@hostB pfring-7.0.0]# modinfo pf_ring

filename:       /lib/modules/3.10.0-693.5.2.el7.x86_64/extra/pf_ring.ko
alias:          net-pf-27
version:        7.0.0
description:    Packet capture acceleration and analysis
author:         ntop.org
license:        GPL
rhelversion:    7.4
srcversion:     9145B13B0227BDF4B9137B7
depends:
vermagic:       3.10.0-693.5.2.el7.x86_64 SMP mod_unload modversions
parm:           min_num_slots:Min number of ring slots (uint)
parm:           perfect_rules_hash_size:Perfect rules hash size (uint)
parm:           enable_tx_capture:Set to 1 to capture outgoing packets (uint)
parm:           enable_frag_coherence:Set to 1 to handle fragments (flow coherence) in clusters (uint)
parm:           enable_ip_defrag:Set to 1 to enable IP defragmentation(only rx traffic is defragmentead) (uint)
parm:           quick_mode:Set to 1 to run at full speed but with upto one socket per interface (uint)
parm:           force_ring_lock:Set to 1 to force ring locking (automatically enable with rss) (uint)
parm:           enable_debug:Set to 1 to enable PF_RING debug tracing into the syslog, 2 for more verbosity (uint)
parm:           transparent_mode:(deprecated) (uint)

I'm a bit afraid of upgrading the system and/or PF_RING just in case it breaks something and the system doesn't work anymore. What do you think about it? They both manage the same SPAN traffic, about 5Gbps average each one. I'm thinking maybe it's a bug that occurs when some kind of traffic pattern shows up.

Many thanks in advance,

Best regards

cardigliano commented 6 years ago

Hi @jkingcis thank you for the detailed informations you provided. From your description, mainly for the timing, it seems this is related to some traffic pattern (e.g. fragments), as you said. I am trying to isolate the code affected by this and I have a couple of requests:

What is the exact pf_ring revision you are using on Host B? Please provide cat /proc/net/pf_ring/info I usually recommend to update to latest code to be in sync and make sure this is not an old issue, however since it is unlikely you would be able to reproduce this in the short term, this is not an option.
I guess that you are using kernel clustering (it is a common configuration with Suricata), correct? Thank you

jkingcis commented 6 years ago

Hi @cardigliano Thanks for your fast answer. The exact pf_ring revision we are using on both hosts is:

[root@hostA ~]#  cat /proc/net/pf_ring/info
PF_RING Version          : 6.6.0 (6.6.0-stable:c3961efb76c70754d7c0a17f8496b8c2c35e3fb4)
Total rings              : 21

Standard (non ZC) Options
Ring slots               : 4096
Slot version             : 16
Capture TX               : Yes [RX+TX]
IP Defragment            : No
Socket Mode              : Standard
Cluster Fragment Queue   : 10558
Cluster Fragment Discard : 0

[root@hostB ~]# cat /proc/net/pf_ring/info
PF_RING Version          : 7.0.0 (7.0.0-stable:83595f31baa6a4daa556801c2f1e261e144618a8)
Total rings              : 20

Standard (non ZC) Options
Ring slots               : 4096
Slot version             : 16
Capture TX               : Yes [RX+TX]
IP Defragment            : No
Socket Mode              : Standard
Cluster Fragment Queue   : 0
Cluster Fragment Discard : 0

As for the kernel clustering term, I'm unsure about what refers to. In suricata.yaml config file, I have this configuration regarding clusters (is the same for both hosts):

# PF_RING configuration. for use with native PF_RING support
# for more info see http://www.ntop.org/products/pf_ring/
pfring:
  - interface: p2p1
    # Number of receive threads (>1 will enable experimental flow pinned
    # runmode)
    threads: 10

    # Default clusterid.  PF_RING will load balance packets based on flow.
    # All threads/processes that will participate need to have the same
    # clusterid.
    cluster-id: 99

    # Default PF_RING cluster type. PF_RING can load balance per flow.
    # Possible values are cluster_flow or cluster_round_robin.
    cluster-type: cluster_flow
    # bpf filter for this interface
    #bpf-filter: tcp
    # Choose checksum verification mode for the interface. At the moment
    # of the capture, some packets may be with an invalid checksum due to
    # offloading to the network card of the checksum computation.
    # Possible values are:
    #  - rxonly: only compute checksum for packets received by network card.
    #  - yes: checksum validation is forced
    #  - no: checksum validation is disabled
    #  - auto: suricata uses a statistical approach to detect when
    #  checksum off-loading is used. (default)
    # Warning: 'checksum-validation' must be set to yes to have any validation
    #checksum-checks: auto
  # Second interface
  - interface: p2p2
    threads: 10
    cluster-id: 93
    cluster-type: cluster_flow

Should I check anywhere else the kernel clustering setup?

Thank you for your time

cardigliano commented 6 years ago

@jkingcis did you install pf_ring from packages or from source code? (i.e. do you have the PF_RING folder with the source code, used to compile pf_ring, in your home?) As of clustering, it seems to be enabled in Suricata, you can double check with: cat /proc/net/pf_ring/- | grep "Cluster Id"

jkingcis commented 6 years ago

@cardigliano I did install PF_RING from the ntop repo, but Suricata was compiled from source with the --enable-pfring flag during the configure process:

# PF_RING installation
wget http://packages.ntop.org/centos-stable/ntop.repo -O /etc/yum.repos.d/ntop.repo
yum install pfring pfring-dkms

Yes, it seems clustering is enabled in Suricata, as:

[root@hostA ~]# cat /proc/net/pf_ring/*  | grep "Cluster Id"
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93

[root@hostB ~]# cat /proc/net/pf_ring/*  | grep "Cluster Id"
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 99
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93
Cluster Id         : 93

cardigliano commented 6 years ago

@jkingcis, after static code analysis, driven by your input, I found something that seems to be related and I pushed a patch. There was a probability for non-ip packets to be recognized as ip fragments during hash computation when the fragments cache was enabled (which is the default). I recommend you to upgrade to latest 7.2 stable (new packages will be available in <1h). Thank you.

jkingcis commented 6 years ago

@cardigliano Yesterday I yum-updated Host B, including pfring, pfring-dkms, kernel and glibc. After re-installing the pf_ring kernel module once the system booted with the new kernel (yum reinstall pfring-dkms after reboot), everything seems to be working fine. The current setup is:

Host B:

Dell PowerEdge R530 (56 vCPU + 64 GB RAM), BIOS 2.3.4 11/08/2016
CentOS Linux release 7.5.1804 (Core)
Kernel 3.10.0-862.11.6.el7.x86_64
Suricata 4.0.3
PF_RING 7.2.0
Glibc glibc-2.17-222.el7.x86_64

[root@hostB ~]#  cat /proc/net/pf_ring/info
PF_RING Version          : 7.2.0 (7.2.0-stable:758f764cf6243c81d61d7b6d7cd1ccee04aaa1b6)
Total rings              : 20

Standard (non ZC) Options
Ring slots               : 4096
Slot version             : 17
Capture TX               : Yes [RX+TX]
IP Defragment            : No
Socket Mode              : Standard
Cluster Fragment Queue   : 647
Cluster Fragment Discard : 0

[root@hostB ~]#  modinfo pf_ring
filename:       /lib/modules/3.10.0-862.11.6.el7.x86_64/extra/pf_ring.ko.xz
alias:          net-pf-27
version:        7.2.0
description:    Packet capture acceleration and analysis
author:         ntop.org
license:        GPL
retpoline:      Y
rhelversion:    7.5
srcversion:     0B168ED2BB628A0718E64F9
depends:
vermagic:       3.10.0-862.11.6.el7.x86_64 SMP mod_unload modversions
parm:           min_num_slots:Min number of ring slots (uint)
parm:           perfect_rules_hash_size:Perfect rules hash size (uint)
parm:           enable_tx_capture:Set to 1 to capture outgoing packets (uint)
parm:           enable_frag_coherence:Set to 1 to handle fragments (flow coherence) in clusters (uint)
parm:           enable_ip_defrag:Set to 1 to enable IP defragmentation(only rx traffic is defragmentead) (uint)
parm:           quick_mode:Set to 1 to run at full speed but with upto one socket per interface (uint)
parm:           force_ring_lock:Set to 1 to force ring locking (automatically enable with rss) (uint)
parm:           enable_debug:Set to 1 to enable PF_RING debug tracing into the syslog, 2 for more verbosity (uint)
parm:           transparent_mode:(deprecated) (uint)

[root@hostB ~]# systemctl status pf_ring
● pf_ring.service - Start/stop pfring service
   Loaded: loaded (/usr/lib/systemd/system/pf_ring.service; enabled; vendor preset: disabled)
   Active: active (exited) since jue 2018-08-23 10:01:41 CEST; 22h ago
  Process: 15912 ExecStop=/usr/local/bin/pf_ringctl stop (code=exited, status=0/SUCCESS)
  Process: 15933 ExecStart=/usr/local/bin/pf_ringctl start (code=exited, status=0/SUCCESS)
 Main PID: 15933 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/pf_ring.service

ago 23 10:01:32 hostB systemd[1]: Starting Start/stop pfring service...
ago 23 10:01:41 hostB pf_ringctl[15933]: Starting PF_RING module: [  OK  ]
ago 23 10:01:41 hostB systemd[1]: Started Start/stop pfring service.

I'm considering waiting about two weeks to update Host A, just in case the traffic pattern appears again and we can test if host A does panic and host B does not. What do you think about it?

Many, MANY thanks for your time @cardigliano

cardigliano commented 6 years ago

Hi @jkingcis, sounds great, please keep us posted.

jkingcis commented 6 years ago

Hi @cardigliano, I just yum-upgraded hostA and yum-reinstalled pfring-dkms after reboot with the new kernel, but Suricata won't start stating "/usr/local/bin/suricata: error while loading shared libraries: libpfring.so: cannot open shared object file: No such file or directory". I have tried reinstalling again pfring package but it won't work either. Do you know what's going on?

[root@hostA ~]# systemctl status suricata
● suricata.service - Suricata Intrusion Detection Service
   Loaded: loaded (/usr/lib/systemd/system/suricata.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since mar 2018-08-28 17:37:19 CEST; 5min ago
  Process: 4620 ExecStartPost=/usr/bin/chmod g+w /var/run/suricata/suricata-command.socket (code=exited, status=1/FAILURE)
  Process: 4456 ExecStartPost=/usr/bin/sleep 120 (code=exited, status=0/SUCCESS)
  Process: 4455 ExecStart=/usr/local/bin/suricata -c /etc/suricata/suricata.yaml $OPTIONS (code=exited, status=127)
  Process: 4451 ExecStartPre=/usr/bin/chown suricata.suricata /var/run/suricata (code=exited, status=0/SUCCESS)
  Process: 4449 ExecStartPre=/usr/bin/mkdir /var/run/suricata (code=exited, status=1/FAILURE)
 Main PID: 4455 (code=exited, status=127)

ago 28 17:35:19 hostA systemd[1]: Starting Suricata Intrusion Detection Service...
ago 28 17:35:19 hostA mkdir[4449]: /usr/bin/mkdir: no se puede crear el directorio «/var/run/suricata»: El fichero ya existe
ago 28 17:35:19 hostA suricata[4455]: /usr/local/bin/suricata: error while loading shared libraries: libpfring.so: cannot open shared object file: No such file or directory
ago 28 17:35:19 hostA systemd[1]: suricata.service: main process exited, code=exited, status=127/n/a
ago 28 17:37:19 hostA chmod[4620]: /usr/bin/chmod: no se puede acceder a «/var/run/suricata/suricata-command.socket»: No existe el fichero o el directorio
ago 28 17:37:19 hostA systemd[1]: suricata.service: control process exited, code=exited status=1
ago 28 17:37:19 hostA systemd[1]: Failed to start Suricata Intrusion Detection Service.
ago 28 17:37:19 hostA systemd[1]: Unit suricata.service entered failed state.
ago 28 17:37:19 hostA systemd[1]: suricata.service failed.

[root@hostA ~]# systemctl status pf_ring
● pf_ring.service - Start/stop pfring service
   Loaded: loaded (/usr/lib/systemd/system/pf_ring.service; enabled; vendor preset: disabled)
   Active: active (exited) since mar 2018-08-28 17:34:56 CEST; 7min ago
  Process: 4028 ExecStop=/usr/local/bin/pf_ringctl stop (code=exited, status=0/SUCCESS)
  Process: 4097 ExecStart=/usr/local/bin/pf_ringctl start (code=exited, status=0/SUCCESS)
 Main PID: 4097 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/pf_ring.service

ago 28 17:34:51 hostA systemd[1]: Starting Start/stop pfring service...
ago 28 17:34:56 hostA pf_ringctl[4097]: Starting PF_RING module: [  OK  ]
ago 28 17:34:56 hostA systemd[1]: Started Start/stop pfring service.

[root@hostA ~]# modinfo pf_ring
filename:       /lib/modules/3.10.0-862.11.6.el7.x86_64/extra/pf_ring.ko.xz
alias:          net-pf-27
version:        7.2.0
description:    Packet capture acceleration and analysis
author:         ntop.org
license:        GPL
retpoline:      Y
rhelversion:    7.5
srcversion:     5FB5FFB600582E9D7D045AF
depends:
vermagic:       3.10.0-862.11.6.el7.x86_64 SMP mod_unload modversions
parm:           min_num_slots:Min number of ring slots (uint)
parm:           perfect_rules_hash_size:Perfect rules hash size (uint)
parm:           enable_tx_capture:Set to 1 to capture outgoing packets (uint)
parm:           enable_frag_coherence:Set to 1 to handle fragments (flow coherence) in clusters (uint)
parm:           enable_ip_defrag:Set to 1 to enable IP defragmentation(only rx traffic is defragmentead) (uint)
parm:           quick_mode:Set to 1 to run at full speed but with upto one socket per interface (uint)
parm:           force_ring_lock:Set to 1 to force ring locking (automatically enable with rss) (uint)
parm:           enable_debug:Set to 1 to enable PF_RING debug tracing into the syslog, 2 for more verbosity (uint)
parm:           transparent_mode:(deprecated) (uint)

[root@hostA ~]# cat /proc/net/pf_ring/info
PF_RING Version          : 7.2.0 (7.2.0-stable:e0a75608922f751f7280bb7ae03d3be41226955b)
Total rings              : 0

Standard (non ZC) Options
Ring slots               : 4096
Slot version             : 17
Capture TX               : Yes [RX+TX]
IP Defragment            : No
Socket Mode              : Standard
Cluster Fragment Queue   : 0
Cluster Fragment Discard : 0

Many thanks

jkingcis commented 6 years ago

I have tried comparing both libpfring.so libraries from Host A and Host B and they both share the same MD5 hash, so it seems it's the same library on both hosts, but I don't get it why Host A is not starting Suricata

jkingcis commented 6 years ago

Hi @cardigliano, I work-arounded the problem by moving Suricata binary from Host B to Host A and now it seems to be working fine. The root cause for me is that Host A was upgrading directly from PF_RING 6.6.0 to PF_RING 7.2.0 (major release) as for Host B's 7.0.0 to 7.2.0, and therefore there was some problems with libraries path or something related. I have included a recompilation of Suricata in every PF_RING major release in my updating procedure documentation, so it seems to be solved :)

cardigliano commented 6 years ago

HI @jkingcis, great, please reopen in case you experience a similar issue again. Thank you.

ntop / PF_RING

Kernel panic in two Linux boxes at the same time (skb_ring_handler) #393