Kernel panic with pf_ring 6.4.1

russagit commented 6 years ago

It looks like a broken packet (?) may cause Kernel Panic with pf_ring. We are receiving a mirroring traffic from some switch. At the same time when that switch experienced some failure we got following panic.

[45293230.194107] kernel BUG at include/linux/skbuff.h:1666!
[45293230.194132] invalid opcode: 0000 [#1] SMP 
[45293230.194151] Modules linked in: ipmi_devintf ext4 mbcache jbd2 xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_
ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack bridge dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop vxlan ip6_udp_tunnel udp_tunnel uio pf
_ring(OE) fuse binfmt_misc 8021q garp stp mrp llc bonding raid10 dm_raid raid456 async_raid6_recov async_memcpy async_pq intel_powerclamp raid6_pq async_xor coretemp xor 
async_tx intel_rapl kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper iTCO_wdt ablk_helper iTCO_vendor_support sb_edac cryptd edac_core 
ses enclosure lpc_ich pcspkr sg mfd_core ioatdma shpchp wmi i2c_i801 ipmi_ssif mei_me mei acpi_pad ipmi_si ipmi_msghandler acpi_power_meter
[45293230.194495]  nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common ast crc32c_in
tel syscopyarea sysfillrect sysimgblt drm_kms_helper igb ttm ptp drm pps_core ahci dca libahci i2c_algo_bit libata megaraid_sas i2c_core dm_mirror dm_region_hash dm_log d
m_mod
[45293230.194699] CPU: 21 PID: 0 Comm: swapper/21 Tainted: G          IOE  ------------   3.10.0-327.28.3.el7.x86_64 #1
[45293230.194734] Hardware name: DEPO Computers Super Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
[45293230.194762] task: ffff882028de3980 ti: ffff88102968c000 task.ti: ffff88102968c000
[45293230.194788] RIP: 0010:[<ffffffff81638f59>]  [<ffffffff81638f59>] __skb_pull.part.47+0x4/0x6
[45293230.194822] RSP: 0018:ffff881040143a28  EFLAGS: 00010287
[45293230.194842] RAX: 000000000000010c RBX: 0000000000000001 RCX: 0000000000000000
[45293230.194867] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8801348bc200
[45293230.194901] RBP: ffff881040143a28 R08: 00000000000195e0 R09: ffffffff8151b819
[45293230.194928] R10: 0000000000000000 R11: ffff8801348bce00 R12: 0000000000000001
[45293230.194960] R13: ffff8801348bc200 R14: 0000000000000000 R15: ffff881040143c77
[45293230.194985] FS:  0000000000000000(0000) GS:ffff881040140000(0000) knlGS:0000000000000000
[45293230.195014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45293230.195034] CR2: 00007fe7d0c3b000 CR3: 0000001806aa1000 CR4: 00000000001407e0
[45293230.195060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[45293230.195085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[45293230.195110] Stack:
[45293230.195119]  ffff881040143a38 ffffffff8151a223 ffff881040143c58 ffffffffa066f728
[45293230.195152]  ffff881000000001 ffff881040143b5c 0000002000000003 0000000000000001
[45293230.195185]  e8bacd1bb77efc93 ffff8801348bce00 ffff88004742da00 ffff88010cc60c00
[45293230.195217] Call Trace:
[45293230.195228]  <IRQ> 
[45293230.195238] 
[45293230.195249]  [<ffffffff8151a223>] skb_pull+0x33/0x40
[45293230.195267]  [<ffffffffa066f728>] skb_ring_handler+0x1e98/0x1ed0 [pf_ring]
[45293230.195294]  [<ffffffff8158b51a>] ? tcp_v4_do_rcv+0x10a/0x340
[45293230.195318]  [<ffffffff81285946>] ? security_sock_rcv_skb+0x16/0x20
[45293230.195342]  [<ffffffff8158cce5>] ? tcp_v4_rcv+0x7b5/0x980
[45293230.195366]  [<ffffffffa066f7f5>] packet_rcv+0x95/0xb0 [pf_ring]
[45293230.195389]  [<ffffffff8152b352>] __netif_receive_skb_core+0x582/0x7d0
[45293230.196557]  [<ffffffff8101c859>] ? read_tsc+0x9/0x10
[45293230.197558]  [<ffffffff8152b5b8>] __netif_receive_skb+0x18/0x60
[45293230.198788]  [<ffffffff8152b640>] netif_receive_skb+0x40/0xc0
[45293230.199972]  [<ffffffff8152c1b0>] napi_gro_receive+0x80/0xb0
[45293230.201097]  [<ffffffffa016ac84>] igb_clean_rx_irq+0x384/0x7a0 [igb]
[45293230.202099]  [<ffffffffa016b41f>] igb_poll+0x37f/0x770 [igb]
[45293230.203072]  [<ffffffff812fc7a4>] ? timerqueue_del+0x24/0x70
[45293230.204016]  [<ffffffff812fc710>] ? timerqueue_add+0x60/0xb0
[45293230.204909]  [<ffffffff8152ba72>] net_rx_action+0x152/0x240
[45293230.206078]  [<ffffffff81084b4f>] __do_softirq+0xef/0x280
[45293230.207002]  [<ffffffff8164845c>] call_softirq+0x1c/0x30
[45293230.207885]  [<ffffffff81016fc5>] do_softirq+0x65/0xa0
[45293230.208752]  [<ffffffff81084ee5>] irq_exit+0x115/0x120
[45293230.209555]  [<ffffffff81648ff8>] do_IRQ+0x58/0xf0
[45293230.210339]  [<ffffffff8163e32d>] common_interrupt+0x6d/0x6d
[45293230.211167]  <EOI> 
[45293230.211177] 
[45293230.211925]  [<ffffffff814d4f92>] ? cpuidle_enter_state+0x52/0xc0
[45293230.212936]  [<ffffffff814d50d9>] cpuidle_idle_call+0xd9/0x210
[45293230.213759]  [<ffffffff8101e4ee>] arch_cpu_idle+0xe/0x30
[45293230.214459]  [<ffffffff810d6485>] cpu_startup_entry+0x245/0x290
[45293230.215136]  [<ffffffff8104768a>] start_secondary+0x1ba/0x230
[45293230.215753] Code: c7 c7 20 2c 8e 81 48 89 04 24 31 c0 e8 eb 6e ff ff 0f 0b 55 48 8b 47 30 48 89 e5 48 8b 17 5d 80 e6 80 48 0f 44 c7 c3 55 48 89 e5 <0f> 0b 55 48 89 e5 0f 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 
[45293230.217123] RIP  [<ffffffff81638f59>] __skb_pull.part.47+0x4/0x6
[45293230.217783]  RSP <ffff881040143a28>

From skbuff.h:
   1663 static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
   1664 {
   1665         skb->len -= len;
   1666         BUG_ON(skb->len < skb->data_len);
   1667         return skb->data += len;
   1668 }

crash> print *(struct sk_buff *) 0xffff8801348bc200
$1 = {
  next = 0x0, 
  prev = 0x0, 
  {
    tstamp = {
      tv64 = 1524390519915673969
    }, 
    skb_mstamp = {
      {
        v64 = 1524390519915673969, 
        {
          stamp_us = 4002150769, 
          stamp_jiffies = 354924825
        }
      }
    }
  }, 
  sk = 0x0, 
  dev = 0xffff8810245b4000, 
  cb = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
  _skb_refdst = 0, 
  sp = 0x0, 
  len = 268, 
  data_len = 272, 
  mac_len = 14, 
  hdr_len = 0, 
  {
    csum = 0, 
    {
      csum_start = 0, 
      csum_offset = 0
    }
  }, 
  priority = 0, 
  ignore_df = 0 '\000', 
  cloned = 1 '\001', 
  ip_summed = 0 '\000', 
  nohdr = 0 '\000', 
  nfctinfo = 0 '\000', 
  pkt_type = 3 '\003', 
  fclone = 0 '\000', 
  ipvs_property = 0 '\000', 
  peeked = 0 '\000', 
  nf_trace = 0 '\000', 
  protocol = 18312, 
  destructor = 0x0, 
  nfct = 0x0, 
  nf_bridge = 0x0, 
  skb_iif = 5, 
  {
    hash = 0, 
    __UNIQUE_ID_rh_kabi_hide8 = {
      rxhash = 0
    }, 
    {<No data fields>}
  }, 
  vlan_proto = 0, 
  vlan_tci = 0, 
  tc_index = 0, 
  tc_verd = 0, 
  queue_mapping = 1, 
  ndisc_nodetype = 0 '\000', 
  pfmemalloc = 0 '\000', 
  ooo_okay = 0 '\000', 
  l4_hash = 0 '\000', 
  wifi_acked_valid = 0 '\000', 
  wifi_acked = 0 '\000', 
  no_fcs = 0 '\000', 
  head_frag = 1 '\001', 
  encapsulation = 0 '\000', 
  encap_hdr_csum = 0 '\000', 
  csum_valid = 0 '\000', 
  csum_complete_sw = 0 '\000', 
  xmit_more = 0 '\000', 
  inner_protocol_type = 0 '\000', 
  remcsum_offload = 0 '\000', 
  {
    napi_id = 0, 
    rh_reserved_dma_cookie = 0
  }, 
  secmark = 0, 
  {
    mark = 0, 
    dropcount = 0, 
    reserved_tailroom = 0
  }, 
  {
    inner_protocol = 0, 
    inner_ipproto = 0 '\000'
  }, 
  inner_transport_header = 0, 
  inner_network_header = 0, 
  inner_mac_header = 0, 
  transport_header = 78, 
  network_header = 82, 
  mac_header = 64, 
  csum_level = 0 '\000', 
  rh_csum_pad = 0 '\000', 
  csum_bad = 0 '\000', 
  rh_reserved1 = 0, 
  rh_reserved2 = 0, 
  rh_reserved3 = 0, 
  rh_reserved4 = 0, 
  tail = 78, 
  end = 320, 
  head = 0xffff88000ebcd580 "\266\215\377\372\b\020mtp3.opc\330\342\001\020mtp3.dpc\320\342\001\016mtp3.ni\006\020mtp3.sls\030\032mtp3.priority", 
  data = 0xffff88000ebcd5ce "\b\003", 
  truesize = 2944, 
  users = {
    counter = 1
  }
}

# cat /proc/net/pf_ring/info 
PF_RING Version          : 6.4.1 (6.4.1-stable:05c307d2efc654404642d97a489cae03d929b343)
Total rings              : 3

Standard (non ZC) Options
Ring slots               : 4096
Slot version             : 16
Capture TX               : Yes [RX+TX]
IP Defragment            : Yes
Socket Mode              : Standard
Total plugins            : 0
Cluster Fragment Queue   : 0
Cluster Fragment Discard : 0

cardigliano commented 6 years ago

Are you able to deterministically reproduce this and try with latest code from github? (6.4.1 is an old version and it could be this issue has been fixed already). Thank you.

russagit commented 6 years ago

I went through all the commits of igb_main.c file since version 6.4.1 and frankly haven't seen any changes that might prevent that condition skb->len < skb->data_len from happening. Any ideas?

cardigliano commented 6 years ago

Hi @russagit, I assume you are using the igb driver distributed with pf_ring. Since I am not able to reproduce this, I am statically analysing the code to figure out what happened. You mentioned a "broken packet", are you able to provide it perhaps? Any additional info including the driver configuration (ethtool -k) is helpful. Thank you.

russagit commented 6 years ago

Sorry, I was wrong. We use a vanilla igb driver. All we currently have is a kernel core dump and the fact that switch failure coincides with a kernel panic (that's why a guess about broken packet, but it's only a guess). Do you think it's possible to recover a packet somehow from the core dump? Thanks

# ethtool -k ipbb02
Features for ipbb02:
rx-checksumming: on
tx-checksumming: on
    tx-checksum-ipv4: on
    tx-checksum-ip-generic: off [fixed]
    tx-checksum-ipv6: on
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: on
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
busy-poll: off [fixed]

cardigliano commented 6 years ago

@russagit are you able to reproduce this "switch failure" to trigger the crash? You should be able to access skb->data from the dump.

russagit commented 6 years ago

No, switch failure is not reproducible. Here is a hex skb->data dump

> x/272bx 0xffff88000ebcd5ce
0xffff88000ebcd5ce:     0x08    0x03    0x00    0xb8    0x46    0xdf    0x00    0x00
0xffff88000ebcd5d6:     0xff    0x84    0xf5    0x70    0x0a    0x32    0x97    0x02
0xffff88000ebcd5de:     0x0a    0x32    0xd3    0x09    0x17    0xd6    0x6a    0xa5
0xffff88000ebcd5e6:     0x49    0x67    0xec    0xfd    0x68    0xa8    0x73    0x11
0xffff88000ebcd5ee:     0x03    0x00    0x00    0x10    0x3e    0xb9    0x6c    0x51
0xffff88000ebcd5f6:     0x00    0x03    0x1c    0xe0    0x00    0x00    0x00    0x00
0xffff88000ebcd5fe:     0x00    0x03    0x00    0x44    0xa1    0xb1    0x82    0xac
0xffff88000ebcd606:     0x00    0x04    0x4c    0xd1    0x00    0x00    0x00    0x03
0xffff88000ebcd60e:     0x01    0x00    0x01    0x01    0x00    0x00    0x00    0x34
0xffff88000ebcd616:     0x02    0x10    0x00    0x29    0x00    0x00    0x38    0xa4
0xffff88000ebcd61e:     0x00    0x00    0x38    0xab    0x03    0x03    0x00    0x04
0xffff88000ebcd626:     0x06    0x07    0x2b    0x4a    0x00    0x01    0x12    0x00
0xffff88000ebcd62e:     0x10    0x53    0x07    0x02    0x06    0x00    0x0a    0x09
0xffff88000ebcd636:     0xff    0xd4    0x75    0xe2    0xac    0xf6    0xb7    0xd7
0xffff88000ebcd63e:     0x22    0x00    0x00    0x00    0x00    0x03    0x00    0x44
0xffff88000ebcd646:     0xa1    0xb1    0x82    0xad    0x00    0x04    0x4c    0xd2
0xffff88000ebcd64e:     0x00    0x00    0x00    0x03    0x01    0x00    0x01    0x01
0xffff88000ebcd656:     0x00    0x00    0x00    0x34    0x02    0x10    0x00    0x29
0xffff88000ebcd65e:     0x00    0x00    0x38    0xa4    0x00    0x00    0x38    0xab
0xffff88000ebcd666:     0x03    0x03    0x00    0x04    0x06    0x06    0xdb    0xc5
0xffff88000ebcd66e:     0x00    0x01    0x12    0x00    0x10    0x53    0x07    0x02
0xffff88000ebcd676:     0x06    0x00    0x0a    0x09    0xff    0x8e    0xf2    0xc5
0xffff88000ebcd67e:     0x57    0xe3    0xa4    0x24    0xd4    0x00    0x00    0x00
0xffff88000ebcd686:     0xa1    0x81    0x8e    0x01    0x70    0x2e    0x70    0x72
0xffff88000ebcd68e:     0x6f    0x63    0x65    0x64    0x75    0x72    0x65    0x43
0xffff88000ebcd696:     0x6c    0x61    0x73    0x73    0x02    0x00    0x00    0x00
0xffff88000ebcd69e:     0x00    0x00    0x02    0x18    0x1e    0x73    0x63    0x74
0xffff88000ebcd6a6:     0x70    0x2e    0x63    0x68    0x75    0x6e    0x6b    0x5f
0xffff88000ebcd6ae:     0x74    0x79    0x70    0x65    0x08    0x64    0x61    0x74
0xffff88000ebcd6b6:     0x61    0x12    0x6d    0x33    0x75    0x61    0x2e    0x74
0xffff88000ebcd6be:     0x79    0x70    0x01    0x00    0x00    0x00    0x00    0x00
0xffff88000ebcd6c6:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0xffff88000ebcd6ce:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0xffff88000ebcd6d6:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

cardigliano commented 6 years ago

@russagit I tried to reproduce this with the corrupted packet with no luck, however it seems that the driver is creating a malformed skbuff, causing that skb_pull assert failure..

russagit commented 6 years ago

Thanks a lot for the analysis. I think we can close this issue as not relevant to pf_ring.

ntop / PF_RING

Kernel panic with pf_ring 6.4.1 #338