Open GoogleCodeExporter opened 9 years ago
I have Intel 82599 10GE Nic with 4 hardware queues enabled and Debian 7 Wheezy. cat /proc/interrupts |grep eth4 74: 808231153 0 0 0 PCI-MSI-edge eth4-TxRx-0 75: 469 906003549 0 0 PCI-MSI-edge eth4-TxRx-1 76: 427 0 817517321 0 PCI-MSI-edge eth4-TxRx-2 77: 289 0 0 1341880240 PCI-MSI-edge eth4-TxRx-3 78: 5 102 0 0 PCI-MSI-edge eth4 When I try to open incorrect ring id with: netmap@eth4-5 I got bunch of kernel errors: [79384.241993] ixgbe 0000:0a:00.0: eth4: NIC Link is Up 10 Gbps, Flow Control: RX/TX [79384.425352] ixgbe 0000:0d:00.0: eth6: NIC Link is Up 10 Gbps, Flow Control: RX/TX [80285.103007] 360.585717 [1758] netmap_interp_ringid invalid ring id 4 [80285.301151] BUG: unable to handle kernel NULL pointer dereference at 0000000000000598 [80285.301198] IP: [<ffffffffa03be12f>] mbq_safe_dequeue+0x54/0x54 [netmap] [80285.301233] PGD 85d3b9067 PUD 85db31067 PMD 0 [80285.301267] Oops: 0000 [#1] SMP [80285.301296] CPU 0 [80285.301302] Modules linked in: ixgbe(O) mdio netmap(O) bridge stp cpufreq_userspace cpufreq_conservative cpufreq_powersave cpufreq_stats binfmt_misc loop snd_pcm snd_page_alloc snd_timer snd iTCO_wdt coretemp crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 aes_generic cryptd soundcore hpwdt iTCO_vendor_support sb_edac hpilo psmouse serio_raw joydev pcspkr button container ioatdma acpi_power_meter edac_core evdev ext4 crc16 jbd2 mbcache dm_mod mperf 3w_9xxx 3w_xxxx raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 md_mod ahci libahci sata_nv sata_sil sata_via libata usbhid hid sg sd_mod crc_t10dif uhci_hcd aacraid scsi_mod thermal ehci_hcd usbcore usb_common igb(O) ptp pps_core dca processor thermal_sys [last unloaded: ixgbe] [80285.309688] [80285.309708] Pid: 4155, comm: kipfw Tainted: G O 3.2.0-4-amd64 #1 Debian 3.2.63-2+deb7u1 HP ProLiant DL380e Gen8 [80285.309756] RIP: 0010:[<ffffffffa03be12f>] [<ffffffffa03be12f>] mbq_safe_dequeue+0x54/0x54 [netmap] [80285.309800] RSP: 0018:ffff880857c63e40 EFLAGS: 00010246 [80285.309822] RAX: ffff88085af76780 RBX: 0000000000000598 RCX: 00000000c0000100 [80285.309847] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000598 [80285.309872] RBP: 0000000000000000 R08: ffff880857c62000 R09: ffff880859450000 [80285.309896] R10: ffffffff81600000 R11: ffff880859450000 R12: ffff88085c94ad80 [80285.309921] R13: ffff88085c94ad80 R14: ffff88087e6540c0 R15: ffff88085ac16bd0 [80285.309946] FS: 00007f75c07cb700(0000) GS:ffff88087ee00000(0000) knlGS:0000000000000000 [80285.309983] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [80285.310006] CR2: 0000000000000598 CR3: 000000083ee35000 CR4: 00000000000406f0 [80285.310031] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [80285.310055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [80285.310080] Process kipfw (pid: 4155, threadinfo ffff880857c62000, task ffff88085af76780) [80285.310117] Stack: [80285.310135] ffffffffa03be170 0000000000000001 ffff88085b00e800 0000000000000598 [80285.310187] ffffffffa03c4726 ffff88085db054c0 ffff88085db054c0 ffff88085b00e800 [80285.310238] ffffffffa03c3df1 ffffffff81036618 ffffffff8134f64c 0000000000000246 [80285.310289] Call Trace: [80285.310311] [<ffffffffa03be170>] ? __mbq_purge+0x1b/0x2e [netmap] [80285.310338] [<ffffffffa03c4726>] ? netmap_hw_krings_delete+0x23/0x36 [netmap] [80285.310376] [<ffffffffa03c3df1>] ? netmap_do_unregif+0x7b/0x100 [netmap] [80285.310404] [<ffffffff81036618>] ? should_resched+0x5/0x23 [80285.310431] [<ffffffff8134f64c>] ? _cond_resched+0x7/0x1c [80285.310456] [<ffffffffa03c6786>] ? netmap_dtor_locked+0xf/0x1e [netmap] [80285.310482] [<ffffffffa03c67af>] ? netmap_dtor+0x1a/0x47 [netmap] [80285.310508] [<ffffffffa03c700d>] ? linux_nm_vi_change_mtu+0x3/0x3 [netmap] [80285.310534] [<ffffffffa03c701f>] ? linux_netmap_release+0x12/0x16 [netmap] [80285.310563] [<ffffffff810fbf45>] ? fput+0xf9/0x1a1 [80285.310586] [<ffffffff810f9c70>] ? filp_close+0x62/0x6a [80285.310609] [<ffffffff810f9d06>] ? sys_close+0x8e/0xcb [80285.310635] [<ffffffff81355a92>] ? system_call_fastpath+0x16/0x1b [80285.310658] Code: 45 00 75 08 48 c7 45 08 00 00 00 00 ff 4d 10 49 c7 04 24 00 00 00 00 48 8b 75 20 48 89 df e8 e2 28 f9 e0 5b 5d 4c 89 e0 41 5c c3 <48> 8b 07 48 85 c0 74 1d 48 8b 10 48 85 d2 48 89 17 75 08 48 c7 [80285.310973] RIP [<ffffffffa03be12f>] mbq_safe_dequeue+0x54/0x54 [netmap] [80285.311003] RSP <ffff880857c63e40> [80285.311022] CR2: 0000000000000598 [80285.311427] ---[ end trace b39b220e2fbeae09 ]--- [80285.399608] ixgbe 0000:0a:00.0: eth4: detected SFP+: 3 [80286.208247] ixgbe 0000:0a:00.0: eth4: NIC Link is Up 10 Gbps, Flow Control: RX/TX And my server become crazy and I should reboot it. Please add checks about number of rings in user space.
Original issue reported on code.google.com by pavel.odintsov on 5 Mar 2015 at 9:21
pavel.odintsov
This is fixed in the 'next' branch. If you cannot/do not want to switch to the newer code, you can apply the attached patch.
Original comment by giuseppe.lettieri73 on 5 Mar 2015 at 1:54
giuseppe.lettieri73
Attachments:
Original issue reported on code.google.com by
pavel.odintsov
on 5 Mar 2015 at 9:21