Open tbarbette opened 7 years ago
I think this is the backtrace of the part refusing to die :
[ 695.433491] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 12, t=99783 jiffies, g=3456, c=3455, q=210526)
[ 695.444916] sending NMI to all CPUs:
[ 695.448537] NMI backtrace for cpu 0
[ 695.452081] CPU: 0 PID: 2370 Comm: kclick Tainted: G W O 3.16.0-4-amd64 #1 Debian 3.16.43-2+deb8u2
[ 695.461886] Hardware name: PRIMINFO X99-A/X99-A, BIOS 2101 11/26/2015
[ 695.468381] task: ffff88081dfd0b60 ti: ffff88081d860000 task.ti: ffff88081d860000
[ 695.475927] RIP: 0010:[
Any idea ? Maybe @peterhurley could give me a direction?
This is on 3.16 but I had similar issue on 3.2 and 4.4. I recompiled in single-thread mode, without any other compile options.
My guess would be that either there is a broken data structure --- maybe due to releasing the lock too early in Task::complete_schedule --- and you're getting into an infinite loop in Task::add_pending or that there was a schedule-while-atomic in some other piece of code that might hold the lock in Task::add_pending.
Have you tried turning on lock checking in the kernel?
Thanks for the idea, I got just a little more info :
In the same stack I've got this line on top of it, seems like a bad RCU
[ 1524.679257] [
And a newcommer :
[ 1507.725237] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1507.730771] 0-...: (3 GPs behind) idle=73b/140000000000001/0 softirq=44151/44151 fqs=5248
[ 1507.739187] (detected by 12, t=5255 jiffies, g=41736, c=41735, q=11605)
[ 1507.745951] Task dump for CPU 0:
[ 1507.749190] kclick R running task 0 24596 2 0x00000008
[ 1507.756311] 0000000000000001 ffff88081f20df40 ffff8805f1d37ca0 ffffffff810ff4de
[ 1507.763869] 0000000000000046 03e5a5b51ce63457 ffff8805f1d37cf0 0000000000000282
[ 1507.771402] ffff8805f1d37cd0 ffffffff810ff5b2 ffffffff810ff515 ffff8805f00bcc00
[ 1507.778916] Call Trace:
[ 1507.781392] [
Hi all, I tried multiple kernel versions (3.2, 3.16 and 4.4) , multiple configuration options, but I always get this error when unloading the Click module. The whole router is unstable and ends up crashing when loaded with enough packets.
Any idea?