ntop / PF_RING

High-speed packet processing framework
http://www.ntop.org
GNU Lesser General Public License v2.1
2.7k stars 349 forks source link

Kernel panic on Redhat 2.6.32-504.12.2.el6.x86_64 running docker #25

Closed rm-star closed 9 years ago

rm-star commented 9 years ago

I have a development environment which is running on ESX vSphere. The VM in question is running redhat 6 on a kernel 2.6.32-504.12.2.el6.x86_64.

We are running docker in this environment and have found that pf_ring seems to be causing a kernel panic. It happens when we request docker to create about 15 containers at the same time.

We are using version 5.6.1 of pf_ring but I have seen the same problem in 6.0.2

Pid: 15, comm: netns Tainted: G W --------------- 2.6.32-504.12.2.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[] [] remove_proc_entry+0x7c/0x1b0 RSP: 0018:ffff88043a2f5c00 EFLAGS: 00010282 RAX: 0000000000000004 RBX: 0001000000000000 RCX: 0000000000000000 RDX: ffff8803e2372a10 RSI: 000000000000002f RDI: ffffffffa0506f18 RBP: ffff88043a2f5c50 R08: 0000000000000004 R09: 0000000000000000 R10: 000000000000000f R11: 000000000000000f R12: ffffffffa0506f18 R13: ffffffffa0506f18 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88002c300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007f0e0f23c000 CR3: 0000000001a85000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process netns (pid: 15, threadinfo ffff88043a2f4000, task ffff88043a278040) Stack: 0000000000000018 ffff88043a2f5c60 ffff88043a2f5c20 ffff8803e23729c0

ffffffff00000018 ffffffffa0506f18 ffff8803e23729c0 ffff880046806c40 ffff880046806c78 00000000ffffffec ffff88043a2f5c70 ffffffffa04f284f Call Trace: [] remove_device_from_ring_list+0xff/0x130 [pf_ring] [] ring_notifier+0x12f/0x310 [pf_ring] [] notifier_call_chain+0x55/0x80 [] raw_notifier_call_chain+0x16/0x20 [] call_netdevice_notifiers+0x1b/0x20 [] rollback_registered_many+0x154/0x280 [] rollback_registered+0x38/0x50 [] ? default_wake_function+0x0/0x20 [] unregister_netdevice_queue+0x58/0xa0 [] unregister_netdevice+0x10/0x20 [] veth_dellink+0x25/0x50 [veth] [] default_device_exit+0x9a/0x100 [] ? wakeme_after_rcu+0x0/0x20 [] ? cleanup_net+0x0/0xa0 [] cleanup_net+0x6e/0xa0 [] worker_thread+0x170/0x2a0 [] ? autoremove_wake_function+0x0/0x40 [] ? worker_thread+0x0/0x2a0 [] kthread+0x9e/0xc0 [] child_rip+0xa/0x20 [] ? kthread+0x0/0xc0 [] ? child_rip+0x0/0x20 The exact disassembly at the IP is /usr/src/debug/kernel-2.6.32-504.12.2.el6/linux-2.6.32-504.12.2.el6.x86_64/fs/proc/generic.c: 31 0xffffffff811ffe9c : movzwl 0x4(%rbx),%ecx which points at a de-referencing of a pointer which is contained in a linked list. It looks like this is happening when a device is being unregistered. <4>[PF_RING] packet_notifier(22) [eth0][1] <4>[PF_RING] packet_notifier(eth0): unhandled message [msg=22][pfring_ptr=(null)] <4>[PF_RING] packet_notifier(9) [veth4a07dc5][1] <4>[PF_RING] packet_notifier(veth4a07dc5): unhandled message [msg=9][pfring_ptr=ffffffffa0507ee0] <6>docker0: port 2(veth4a07dc5) entering disabled state <4>[PF_RING] packet_notifier(2) [veth4a07dc5][1] <6>device veth4a07dc5 left promiscuous mode <6>docker0: port 2(veth4a07dc5) entering disabled state <4>[PF_RING] packet_notifier(11) [docker0][1] <4>[PF_RING] packet_notifier(docker0): unhandled message [msg=11][pfring_ptr=ffffffffa0507ee0] <4>[PF_RING] packet_notifier(6) [veth4a07dc5][1] <4>[PF_RING] packet_notifier(veth4a07dc5) [UNREGISTER][pfring_ptr=ffffffffa0507ee0] [pfring_ptr=ffff8803e2320200] <4>[PF_RING] packet_notifier(22) [veth4a07dc5][1] <4>[PF_RING] packet_notifier(veth4a07dc5): unhandled message [msg=22][pfring_ptr=(null)] <4>[PF_RING] packet_notifier(9) [lo][772] <4>[PF_RING] packet_notifier(lo): skipping non ethernet device <4>[PF_RING] packet_notifier(2) [lo][772] <4>[PF_RING] packet_notifier(lo): skipping non ethernet device <4>[PF_RING] packet_notifier(6) [lo][772] <4>[PF_RING] packet_notifier(lo): skipping non ethernet device <4>[PF_RING] packet_notifier(22) [lo][772] <4>[PF_RING] packet_notifier(lo): skipping non ethernet device <6>docker0: port 3(veth701d196) entering forwarding state <4>[PF_RING] packet_notifier(9) [eth0][1] <4>[PF_RING] packet_notifier(eth0): unhandled message [msg=9][pfring_ptr=ffffffffa0507ee0] <4>[PF_RING] packet_notifier(2) [eth0][1] <4>[PF_RING] packet_notifier(6) [eth0][1] <4>[PF_RING] packet_notifier(eth0) [UNREGISTER][pfring_ptr=ffffffffa0507ee0] <4>general protection fault: 0000 [#1] SMP Do you have any advice? Thanks
rm-star commented 9 years ago

Here is a test script which can recreate the issue. Your kernel will need to support ip netns

!/bin/sh

set -x

ip netns add $PID

function makeDevs() { local PID=$RANDOM ip netns add $PID

local DEV1=$RANDOM
local DEV2=$RANDOM

ip link add veth$DEV1 type veth peer name veth$DEV2
ip link set veth$DEV1 up

ip link set veth$DEV2 netns $PID
ip netns exec $PID ip link set dev veth$DEV2 name eth0
ip netns exec $PID ip link set eth0 up 
ip netns exec $PID ip link delete eth0

}

for i in seq 1 100; do makeDevs & done

lucaderi commented 9 years ago

This problem has been fixed in the code that is currently in git.