stefano-garzarella / ptnetmap

ptnetmap source code
13 stars 4 forks source link

Kernel modules crash when trying to attach a physical ixgbe interface to the VALE switch #3

Closed xfxian closed 8 years ago

xfxian commented 8 years ago

Hi, I'm trying to connect my VM's netmap-aware service through the physical 10GB interface on the host using ptnetmap.

I'm running Ubuntu 14.04.3 with 3.13.0-66-generic. For higher Kernel versions of this release, the ptnetmap compilation fails (3.16 and 3.19).

I can reproduce the attached stacktrace by loading ptnetmap and attaching the interface like this:

$ cd /usr/src/ptnetmap/netmap/LINUX/
$ insmod netmap.ko
$ rmmod ixgbe
$ insmod ixgbe/ixgbe.ko
$ cd ../examples/
$ ./vale-ctl -a vale0:eth0
$ ip link set dev eth0 up

This is where the system crashes. It also happens if I start qemu with a -netdev netmap,ifname=netmap:eth0, thus directly connecting to the netmap interface, when I ip link set dev eth0 up in the guest then.

When I am using vanilla netmap, I can do all the previous steps successfully and send packages to the network with pkt-gen.

Stacktrace:

Oct 28 19:19:03 host kernel: [ 2799.120583] BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
Oct 28 19:19:03 host kernel: [ 2799.124563] IP: [<ffffffffa02ab863>] ixgbe_configure_rx_ring+0x463/0x680 [ixgbe]
Oct 28 19:19:03 host kernel: [ 2799.132290] PGD 63e11067 PUD 63e12067 PMD 0 
Oct 28 19:19:03 host kernel: [ 2799.132290] Oops: 0000 [#1] SMP 
Oct 28 19:19:03 host kernel: [ 2799.132290] Modules linked in: ixgbe(OX) netmap(OX) openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich dcdbas radeon coretemp kvm_intel lpc_ich kvm ttm drm_kms_helper serio_raw i5000_edac drm i2c_algo_bit edac_core i5k_amb ipmi_si shpchp lp parport mac_hid ses enclosure hid_generic usbhid dca psmouse hid ptp pata_acpi pps_core mdio megaraid_sas bnx2 [last unloaded: netmap]
Oct 28 19:19:03 host kernel: [ 2799.132290] CPU: 2 PID: 2120 Comm: vale-ctl Tainted: G           OX 3.13.0-66-generic #108-Ubuntu
Oct 28 19:19:03 host kernel: [ 2799.132290] Hardware name: Dell Inc. PowerEdge 1950/0UR033, BIOS 2.5.0 09/12/2008
Oct 28 19:19:03 host kernel: [ 2799.132290] task: ffff8800bf72c800 ti: ffff88006751a000 task.ti: ffff88006751a000
Oct 28 19:19:03 host kernel: [ 2799.132290] RIP: 0010:[<ffffffffa02ab863>]  [<ffffffffa02ab863>] ixgbe_configure_rx_ring+0x463/0x680 [ixgbe]
Oct 28 19:19:03 host kernel: [ 2799.132290] RSP: 0018:ffff88006751bae8  EFLAGS: 00010246
Oct 28 19:19:03 host kernel: [ 2799.132290] RAX: 0000000000000000 RBX: ffff8800c5566000 RCX: 0000000000000200
Oct 28 19:19:03 host kernel: [ 2799.132290] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000001ff
Oct 28 19:19:03 host kernel: [ 2799.132290] RBP: ffff88006751bb48 R08: ffffc90011755000 R09: 0000000000000000
Oct 28 19:19:03 host kernel: [ 2799.132290] R10: 0000000000000002 R11: ffff8800cab64a00 R12: ffff880036370880
Oct 28 19:19:03 host kernel: [ 2799.132290] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000100
Oct 28 19:19:03 host kernel: [ 2799.132290] FS:  00007f34f302d740(0000) GS:ffff88012fc80000(0000) knlGS:0000000000000000
Oct 28 19:19:03 host kernel: [ 2799.132290] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct 28 19:19:03 host kernel: [ 2799.132290] CR2: 0000000000000100 CR3: 0000000063e14000 CR4: 00000000000027e0
Oct 28 19:19:03 host kernel: [ 2799.132290] Stack:
Oct 28 19:19:03 host kernel: [ 2799.132290]  ffff880000000002 ffff8800cab64a00 ffff8800c5566000 ffff880000000002
Oct 28 19:19:03 host kernel: [ 2799.132290]  00000000cab61d80 0000000056311197 000000000005dd01 ffff880036370880
Oct 28 19:19:03 host kernel: [ 2799.132290]  ffff880036371680 0000000000000000 0000000000000001 ffff880036370ce0
Oct 28 19:19:03 host kernel: [ 2799.132290] Call Trace:
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa02ac613>] ixgbe_configure+0x883/0xa60 [ixgbe]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa02ad658>] ixgbe_netmap_reg+0xb8/0x120 [ixgbe]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa0290fcc>] netmap_hw_register+0x1c/0x30 [netmap]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa0286b3d>] netmap_bwrap_register+0x6d/0x1c0 [netmap]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa02930b7>] netmap_do_regif+0x2f7/0x310 [netmap]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffff811a3099>] ? __kmalloc+0x1e9/0x230
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa02879dc>] netmap_bwrap_bdg_ctl+0x1ac/0x2d0 [netmap]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa028a112>] netmap_bdg_ctl+0x3f2/0x5b0 [netmap]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa02942ea>] netmap_ioctl+0x4aa/0x780 [netmap]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffff81176789>] ? __do_fault+0x429/0x530
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffffa029535b>] linux_netmap_ioctl+0xab/0x140 [netmap]
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffff81730234>] ? __do_page_fault+0x204/0x570
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffff811d16a0>] do_vfs_ioctl+0x2e0/0x4c0
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffff811ce2b2>] ? final_putname+0x22/0x50
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffff811d1901>] SyS_ioctl+0x81/0xa0
Oct 28 19:19:03 host kernel: [ 2799.132290]  [<ffffffff81734c5d>] system_call_fastpath+0x1a/0x1f
Oct 28 19:19:03 host kernel: [ 2799.132290] Code: 44 03 4a 24 8b 4a 20 44 89 ca 0f 88 c0 00 00 00 45 89 c8 41 29 c8 44 39 c9 41 0f 4e d0 48 63 d2 4c 8b 83 00 01 00 00 48 c1 e2 04 <41> 8b 14 17 39 93 08 01 00 00 76 a1 48 c1 e2 04 49 8b 4c 10 08 
Oct 28 19:19:03 host kernel: [ 2799.132290] RIP  [<ffffffffa02ab863>] ixgbe_configure_rx_ring+0x463/0x680 [ixgbe]
Oct 28 19:19:03 host kernel: [ 2799.132290]  RSP <ffff88006751bae8>
Oct 28 19:19:03 host kernel: [ 2799.132290] CR2: 0000000000000100
Oct 28 19:19:03 host kernel: [ 2799.548088] ---[ end trace 0830df42049f48c5 ]---
stefano-garzarella commented 8 years ago

Hi, are you using multiple queues on ixgbe?

There is some problem with it. (I'll try to solve it on next week) Can you set single queue through ethtool?

Cheers, Stefano

xfxian commented 8 years ago

Hi Stefano,

it seems it was defaulting to 8 queues, so I tried to set it to one:

root@host:~# ethtool -L net_d combined 1
root@host:~# ip link set dev eth0 down
root@host:~# ip link set dev eth0 up
root@host:~# ./vale-ctl -g eth0
bdg_ctl [165] eth0: 1 queues.
root@host:~# ./vale-ctl -a vale0:eth0

Here it stacktraces again. Could it somehow be related to IOMMU settings? Right now I am booting with "intel_iommu=on", do I also need to set "iommu=pt" or some other value? Thank you in advance.

Best regards, Jacob

jmmlmendes commented 8 years ago

Hi,

Are you using the version d4bf89c of netmap? (as pointed by the ptnetmap repo) I tested it and indeed it causes a kernel panic on the host when you try to attach a physical NIC to VALE.

The problem seems to be already fixed in most recent versions of netmap. I tried with 579b43 and it worked fine.

After that, in the guest, you can use either an updated version of netmap or the version pointed by the ptnetmap repo. At least for me, both versions worked fine when used in the guest with both e1000 and virtio drivers, the issue is only when using VALE + a physical NIC.

Cheers, Jose