Closed xairy closed 8 years ago
Do you see this warning:
285 /*
286 * Can deadlock when called with interrupts disabled.
287 * We allow cpu's that are not yet online though, as no one else can
288 * send smp call function interrupt to this cpu and as such deadlocks
289 * can't happen.
290 */
291 WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
292 && !oops_in_progress);
from smp_call_function_single?
No.
Does it dump stacks on other CPUs? There must the second part of deadlock. If we see both stacks, we may figure out what happens.
No, I haven't seen stack from other CPUs. I'll post more reports.
INFO: rcu_sched detected stalls on CPUs/tasks:
2: (1983 ticks this GP) idle=dcd/140000000000000/0 softirq=33018/33018 fqs=944
(detected by 1, t=21013 jiffies, g=15551, c=15550, q=1584)
Task dump for CPU 2:
trinity-c8 R running task 12776 3877 3838 0x0000000c
ffffffff8272ae18 0000000000000003 ffffea0007c0f41c ffff88023c92fd90
ffffffff811d3b51 ffff88023c92fd90 00000000ffffffff ffffffff81255e49
ffff88023c9b6a70 ffffffff812505a6 0000000000000093 ffff88024d5fb000
Call Trace:
[< inlined >] atomic_dec_and_test ./arch/x86/include/asm/atomic.h:151
[< inlined >] put_page_testzero include/linux/mm.h:345
[<ffffffff811d3b51>] ? __free_pages+0x31/0x90 mm/page_alloc.c:3288
[<ffffffff81255e49>] ? kt_sync_ensure_created+0x29/0xb0 mm/ktsan/sync.c:13
[<ffffffff812505a6>] ? kt_spin_unlock+0x16/0x20 mm/ktsan/spinlock.c:20
[<ffffffff8125620b>] ? kt_tab_access+0x11b/0x1b0 mm/ktsan/tab.c:129
[<ffffffff8124f954>] ? kt_memblock_remove_sync+0x14/0x80 mm/ktsan/memblock.c:70
[<ffffffff81255f4b>] ? kt_sync_drop_and_free+0x3b/0x60 mm/ktsan/sync.c:64
[<ffffffff8124b4cf>] ? ktsan_sync_drop+0x7f/0xc0 mm/ktsan/ktsan.c:347 (discriminator 7)
[<ffffffff811d3b5c>] ? __free_pages+0x3c/0x90 mm/page_alloc.c:3289
[<ffffffff81222956>] ? __vunmap+0x96/0x160 mm/vmalloc.c:1485 (discriminator 2)
[<ffffffff81222ac1>] ? vfree+0x41/0xb0 mm/vmalloc.c:1525
[< inlined >] copy_module_from_user kernel/module.c:2675
[< inlined >] SYSC_init_module kernel/module.c:3592
[<ffffffff8113b229>] ? SyS_init_module+0x129/0x150 kernel/module.c:3579
[<ffffffff81eb0d6e>] ? entry_SYSCALL_64_fastpath+0x12/0x71 arch/x86/entry/entry_64.S:186
INFO: rcu_sched detected stalls on CPUs/tasks:
3: (1674 ticks this GP) idle=fa7/140000000000000/0 softirq=33477/33477 fqs=5024
(detected by 1, t=21005 jiffies, g=15552, c=15551, q=3880)
Task dump for CPU 3:
trinity-c2 R running task 12536 3940 3838 0x0000000c
ffffea0002971300 ffffffff811d3b5c 00000000000068da ffff8802149aa740
ffff8802149aa768 ffff8802149aa760 ffff88020b87be78 ffffffff81222956
0000000000000206 ffffc900242cc000 000000001ed50000 00007f901100d000
Call Trace:
[<ffffffff811d3b5c>] ? __free_pages+0x3c/0x90 mm/page_alloc.c:3289
[<ffffffff81222956>] ? __vunmap+0x96/0x160 mm/vmalloc.c:1485 (discriminator 2)
[<ffffffff81222ac1>] ? vfree+0x41/0xb0 mm/vmalloc.c:1525
[< inlined >] copy_module_from_user kernel/module.c:2675
[< inlined >] SYSC_init_module kernel/module.c:3592
[<ffffffff8113b229>] ? SyS_init_module+0x129/0x150 kernel/module.c:3579
[<ffffffff81eb0d6e>] ? entry_SYSCALL_64_fastpath+0x12/0x71 arch/x86/entry/entry_64.S:186
The lockups were most probably caused by deadlock kt_sync_drop vs kt_memblock_free. This was fixed #65.
Seeing more lockups:
INFO: rcu_sched self-detected stall on CPU
3: (20998 ticks this GP) idle=ec9/140000000000001/0 softirq=14965/14965 fqs=6967
INFO: rcu_sched detected stalls on CPUs/tasks:
0: (1 GPs behind) idle=d3d/140000000000000/0 softirq=15777/15778 fqs=6967
3: (20998 ticks this GP) idle=ec9/140000000000001/0 softirq=14965/14965 fqs=6967
(detected by 1, t=21002 jiffies, g=7900, c=7899, q=271)
Task dump for CPU 0:
trinity-c4 R running task 8 2861 2527 0x10000008
0000070200000200 ffff880200800000 ffff88028c750f44 00000000de1be1c8
0000000000000296 ffff8802de1be1c8 0000000000000286 ffff88028c750f40
ffffffff82021910 ffff8802de1be1c8 ffff8800bb905b30 0000000000000296
Call Trace:
[<ffffffff81384c1e>] ? __ext4_journal_stop+0x5e/0x110 fs/ext4/ext4_jbd2.c:88
[<ffffffff81554baa>] ? iov_iter_advance+0x11a/0x310 lib/iov_iter.c:512
[<ffffffff8124e75e>] ? kt_spin_unlock+0xe/0x20 mm/ktsan/spinlock.c:20
[<ffffffff8125221b>] ? kt_tab_access+0x11b/0x1b0 mm/ktsan/tab.c:129
[<ffffffff8124e75e>] ? kt_spin_unlock+0xe/0x20 mm/ktsan/spinlock.c:20
[<ffffffff81251e90>] ? kt_sync_acquire+0x90/0x160 mm/ktsan/sync.c:93
[<ffffffff8124cb7c>] ? ktsan_atomic32_compare_exchange+0xcc/0xf0 mm/ktsan/ktsan.c:709
[< inline >] ? atomic_cmpxchg ./arch/x86/include/asm/atomic.h:235
[< inline >] ? virt_queued_spin_lock ./arch/x86/include/asm/qspinlock.h:49
[<ffffffff810dd899>] ? queued_spin_lock_slowpath+0x39/0x2c0 kernel/locking/qspinlock.c:292
[<ffffffff8124cb7c>] ? ktsan_atomic32_compare_exchange+0xcc/0xf0 mm/ktsan/ktsan.c:709
[< inline >] ? __raw_spin_lock include/asm-generic/qspinlock.h:83
[<ffffffff81ea8286>] ? _raw_spin_lock+0x66/0x70 kernel/locking/spinlock.c:151
[< inline >] ? spin_lock include/linux/spinlock.h:312
[< inline >] ? lock_pte_protection mm/mprotect.c:49
[< inline >] ? change_pte_range mm/mprotect.c:71
[< inline >] ? change_pmd_range mm/mprotect.c:180
[< inline >] ? change_pud_range mm/mprotect.c:206
[< inline >] ? change_protection_range mm/mprotect.c:231
[<ffffffff81218f29>] ? change_protection+0x619/0x760 mm/mprotect.c:252
[<ffffffff812191ed>] ? mprotect_fixup+0x17d/0x2f0 mm/mprotect.c:324
[< inline >] ? SYSC_mprotect mm/mprotect.c:425
[<ffffffff8121953b>] ? SyS_mprotect+0x1db/0x310 mm/mprotect.c:346
[<ffffffff8101d078>] ? syscall_trace_leave+0x88/0x160 arch/x86/kernel/ptrace.c:1634
[<ffffffff81ea89fd>] ? tracesys_phase2+0x84/0x89 arch/x86/entry/entry_64.S:268
Task dump for CPU 3:
trinity-c14 R running task 16 2983 2527 0x10000008
ffff8800bae1e3b0 ffff880281d58840 ffffea000a03a600 ffff8800bae1e3b0
ffff88027ea05680 0000000000000702 ffff880281ed7c68 ffffffff8121bc6c
0000000000000292 0000000000000292 ffff8800bae1e408 ffff8800bae1e3b0
Call Trace:
[<ffffffff8121bc6c>] ? try_to_unmap_one+0xbc/0x4f0 mm/rmap.c:1223
[<ffffffff8121bbb0>] ? page_remove_rmap+0x40/0x40 mm/rmap.c:1183
[<ffffffff8121bbb0>] ? page_remove_rmap+0x40/0x40 mm/rmap.c:1183
[< inline >] ? rmap_walk_file mm/rmap.c:1540
[<ffffffff8121d247>] ? rmap_walk+0x147/0x450 mm/rmap.c:1559
[<ffffffff8121d90b>] ? try_to_unmap+0xbb/0x110 mm/rmap.c:1388
[<ffffffff8121bbb0>] ? page_remove_rmap+0x40/0x40 mm/rmap.c:1183
[<ffffffff8121adf0>] ? anon_vma_ctor+0x60/0x60 mm/rmap.c:423
[<ffffffff8121c330>] ? page_get_anon_vma+0xd0/0xd0 ./arch/x86/include/asm/atomic.h:264
[< inline >] ? __unmap_and_move mm/migrate.c:882
[< inline >] ? unmap_and_move mm/migrate.c:941
[<ffffffff81248226>] ? migrate_pages+0x326/0x9f0 mm/migrate.c:1130
[<ffffffff8123ac80>] ? alloc_pages_vma+0x220/0x220 mm/mempolicy.c:1775
[<ffffffff8123b615>] ? do_mbind+0x5b5/0x640 mm/mempolicy.c:1217
[<ffffffff8124e75e>] ? kt_spin_unlock+0xe/0x20 mm/ktsan/spinlock.c:20
[<ffffffff81237e77>] ? get_nodes+0x107/0x140 mm/mempolicy.c:1249
[< inline >] ? SYSC_mbind mm/mempolicy.c:1319
[<ffffffff8123b86a>] ? SyS_mbind+0xaa/0xb0 mm/mempolicy.c:1301
[<ffffffff8101d078>] ? syscall_trace_leave+0x88/0x160 arch/x86/kernel/ptrace.c:1634
[<ffffffff81ea89fd>] ? tracesys_phase2+0x84/0x89 arch/x86/entry/entry_64.S:268
(t=21114 jiffies g=7900 c=7899 q=271)
Task dump for CPU 0:
trinity-c4 R running task 8 2861 2527 0x10000008
0000070200000200 ffff880200800000 ffff88028c750f44 00000000de1be1c8
0000000000000296 ffff8802de1be1c8 0000000000000286 ffff88028c750f40
ffffffff82021910 ffff8802de1be1c8 ffff8800bb905b30 0000000000000296
Call Trace:
[<ffffffff81384c1e>] ? __ext4_journal_stop+0x5e/0x110 fs/ext4/ext4_jbd2.c:88
[<ffffffff81554baa>] ? iov_iter_advance+0x11a/0x310 lib/iov_iter.c:512
[<ffffffff8124e75e>] ? kt_spin_unlock+0xe/0x20 mm/ktsan/spinlock.c:20
[<ffffffff8125221b>] ? kt_tab_access+0x11b/0x1b0 mm/ktsan/tab.c:129
[<ffffffff8124e75e>] ? kt_spin_unlock+0xe/0x20 mm/ktsan/spinlock.c:20
[<ffffffff81251e90>] ? kt_sync_acquire+0x90/0x160 mm/ktsan/sync.c:93
[<ffffffff8124cadf>] ? ktsan_atomic32_compare_exchange+0x2f/0xf0 mm/ktsan/ktsan.c:704
[< inline >] ? atomic_cmpxchg ./arch/x86/include/asm/atomic.h:235
[< inline >] ? virt_queued_spin_lock ./arch/x86/include/asm/qspinlock.h:49
[<ffffffff810dd899>] ? queued_spin_lock_slowpath+0x39/0x2c0 kernel/locking/qspinlock.c:292
[<ffffffff8124cb7c>] ? ktsan_atomic32_compare_exchange+0xcc/0xf0 mm/ktsan/ktsan.c:709
[< inline >] ? __raw_spin_lock include/asm-generic/qspinlock.h:83
[<ffffffff81ea8286>] ? _raw_spin_lock+0x66/0x70 kernel/locking/spinlock.c:151
[< inline >] ? spin_lock include/linux/spinlock.h:312
[< inline >] ? lock_pte_protection mm/mprotect.c:49
[< inline >] ? change_pte_range mm/mprotect.c:71
[< inline >] ? change_pmd_range mm/mprotect.c:180
[< inline >] ? change_pud_range mm/mprotect.c:206
[< inline >] ? change_protection_range mm/mprotect.c:231
[<ffffffff81218f29>] ? change_protection+0x619/0x760 mm/mprotect.c:252
[<ffffffff812191ed>] ? mprotect_fixup+0x17d/0x2f0 mm/mprotect.c:324
[< inline >] ? SYSC_mprotect mm/mprotect.c:425
[<ffffffff8121953b>] ? SyS_mprotect+0x1db/0x310 mm/mprotect.c:346
[<ffffffff8101d078>] ? syscall_trace_leave+0x88/0x160 arch/x86/kernel/ptrace.c:1634
[<ffffffff81ea89fd>] ? tracesys_phase2+0x84/0x89 arch/x86/entry/entry_64.S:268
Task dump for CPU 3:
trinity-c14 R running task 16 2983 2527 0x10000008
0000000000000000 0000000000000010 ffff88028dd9c680 ffff88063fd83c98
ffffffff810c8aa0 0000000640000000 0000000000000003 0000000000000003
ffffffff82440e80 ffff88063fd83cb8 ffffffff810cb158 ffffffff82440e80
Call Trace:
<IRQ> [<ffffffff810c8aa0>] sched_show_task+0xf0/0x150 kernel/sched/core.c:4837
[<ffffffff810cb158>] dump_cpu_task+0x38/0x40 kernel/sched/core.c:8484
[<ffffffff81e95c8c>] rcu_dump_cpu_stacks+0xc3/0x113 kernel/rcu/tree.c:1201
[< inline >] print_cpu_stall kernel/rcu/tree.c:1308
[< inline >] check_cpu_stall kernel/rcu/tree.c:1372
[< inline >] __rcu_pending kernel/rcu/tree.c:3474
[< inline >] rcu_pending kernel/rcu/tree.c:3538
[<ffffffff81107d0f>] rcu_check_callbacks+0x5ef/0xca0 kernel/rcu/tree.c:2760
[<ffffffff81251e90>] ? kt_sync_acquire+0x90/0x160 mm/ktsan/sync.c:93
[<ffffffff8124e75e>] ? kt_spin_unlock+0xe/0x20 mm/ktsan/spinlock.c:20
[<ffffffff81249f8b>] ? kt_cache_alloc+0x2b/0x40 mm/ktsan/cache.c:53
[<ffffffff8110f8de>] update_process_times+0x3e/0x70 kernel/time/timer.c:1397
[<ffffffff81128b1e>] tick_sched_handle.isra.15+0x7e/0xa0 kernel/time/tick-sched.c:151
[<ffffffff8112971e>] tick_sched_timer+0x5e/0xb0 kernel/time/tick-sched.c:1081
[< inline >] __run_hrtimer kernel/time/hrtimer.c:1223
[<ffffffff81110e87>] __hrtimer_run_queues+0x237/0x450 kernel/time/hrtimer.c:1287
[<ffffffff811296c0>] ? tick_sched_do_timer+0x70/0x70 kernel/time/tick-sched.c:132
[<ffffffff8111167e>] hrtimer_interrupt+0xfe/0x290 kernel/time/hrtimer.c:1321
[<ffffffff8105d9b7>] local_apic_timer_interrupt+0x77/0xc0 arch/x86/kernel/apic/apic.c:890
[<ffffffff8105dd06>] smp_apic_timer_interrupt+0x76/0xb0 arch/x86/kernel/apic/apic.c:914
[<ffffffff81ea978a>] apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790
<EOI> [< inline >] ? arch_local_irq_restore ./arch/x86/include/asm/paravirt.h:836
<EOI> [<ffffffff8124b95e>] ? ktsan_atomic32_load+0x9e/0xc0 mm/ktsan/ktsan.c:591
[< inline >] ? rep_nop ./arch/x86/include/asm/processor.h:567
[< inline >] ? cpu_relax ./arch/x86/include/asm/processor.h:572
[<ffffffff81130d6d>] ? csd_lock_wait+0x1d/0x40 kernel/smp.c:111
[<ffffffff81080a70>] ? do_flush_tlb_all+0x70/0x70 arch/x86/mm/tlb.c:248
[<ffffffff81130d7a>] ? csd_lock_wait+0x2a/0x40 kernel/smp.c:110
[<ffffffff81131019>] ? smp_call_function_single+0xb9/0x190 kernel/smp.c:303
[<ffffffff81080a70>] ? do_flush_tlb_all+0x70/0x70 arch/x86/mm/tlb.c:248
[<ffffffff81131748>] ? smp_call_function_many+0x368/0x3a0 kernel/smp.c:435
[<ffffffff81080a70>] ? do_flush_tlb_all+0x70/0x70 arch/x86/mm/tlb.c:248
[<ffffffff81080d4b>] ? native_flush_tlb_others+0x6b/0x80 arch/x86/mm/tlb.c:153
[< inline >] ? flush_tlb_others ./arch/x86/include/asm/paravirt.h:362
[<ffffffff81081151>] ? flush_tlb_page+0x91/0xf0 arch/x86/mm/tlb.c:237
[<ffffffff81223aec>] ? ptep_clear_flush+0x8c/0xb0 mm/pgtable-generic.c:117
[<ffffffff8121bc6c>] ? try_to_unmap_one+0xbc/0x4f0 mm/rmap.c:1223
[<ffffffff8121bbb0>] ? page_remove_rmap+0x40/0x40 mm/rmap.c:1183
[<ffffffff8121bbb0>] ? page_remove_rmap+0x40/0x40 mm/rmap.c:1183
[< inline >] ? rmap_walk_file mm/rmap.c:1540
[<ffffffff8121d247>] ? rmap_walk+0x147/0x450 mm/rmap.c:1559
[<ffffffff8121d90b>] ? try_to_unmap+0xbb/0x110 mm/rmap.c:1388
[<ffffffff8121bbb0>] ? page_remove_rmap+0x40/0x40 mm/rmap.c:1183
[<ffffffff8121adf0>] ? anon_vma_ctor+0x60/0x60 mm/rmap.c:423
[<ffffffff8121c330>] ? page_get_anon_vma+0xd0/0xd0 ./arch/x86/include/asm/atomic.h:264
[< inline >] ? __unmap_and_move mm/migrate.c:882
[< inline >] ? unmap_and_move mm/migrate.c:941
[<ffffffff81248226>] ? migrate_pages+0x326/0x9f0 mm/migrate.c:1130
[<ffffffff8123ac80>] ? alloc_pages_vma+0x220/0x220 mm/mempolicy.c:1775
[<ffffffff8123b615>] ? do_mbind+0x5b5/0x640 mm/mempolicy.c:1217
[<ffffffff8124e75e>] ? kt_spin_unlock+0xe/0x20 mm/ktsan/spinlock.c:20
[<ffffffff81237e77>] ? get_nodes+0x107/0x140 mm/mempolicy.c:1249
[< inline >] ? SYSC_mbind mm/mempolicy.c:1319
[<ffffffff8123b86a>] ? SyS_mbind+0xaa/0xb0 mm/mempolicy.c:1301
[<ffffffff8101d078>] ? syscall_trace_leave+0x88/0x160 arch/x86/kernel/ptrace.c:1634
[<ffffffff81ea89fd>] ? tracesys_phase2+0x84/0x89 arch/x86/entry/entry_64.S:268
INFO: rcu_sched detected stalls on CPUs/tasks:
(detected by 1, t=21003 jiffies, g=12860, c=12859, q=470)
All QSes seen, last rcu_sched kthread activity 21003 (4295121529-4295100526), jiffies_till_next_fqs=3, root ->qsmask 0x0
trinity-c5 S ffff88063fc95240 12960 3846 3564 0x00000000
0000000000000000 00000000000032a0 ffff88028df547c0 ffff88063fc83d08
ffffffff810c8aa0 000000010002086e 0000000100025a79 0000000000000000
ffffffff82440e80 ffff88063fc83dd8 ffffffff8110839f 0000000000000000
Call Trace:
<IRQ> [<ffffffff810c8aa0>] sched_show_task+0xf0/0x150 kernel/sched/core.c:4837
[< inline >] print_other_cpu_stall kernel/rcu/tree.c:1272
[< inline >] check_cpu_stall kernel/rcu/tree.c:1378
[< inline >] __rcu_pending kernel/rcu/tree.c:3474
[< inline >] rcu_pending kernel/rcu/tree.c:3538
[<ffffffff8110839f>] rcu_check_callbacks+0xc7f/0xca0 kernel/rcu/tree.c:2760
[<ffffffff81251e60>] ? kt_sync_acquire+0x90/0x160 mm/ktsan/sync.c:93
[<ffffffff8110f8de>] update_process_times+0x3e/0x70 kernel/time/timer.c:1397
[<ffffffff81128b1e>] tick_sched_handle.isra.15+0x7e/0xa0 kernel/time/tick-sched.c:151
[<ffffffff8112971e>] tick_sched_timer+0x5e/0xb0 kernel/time/tick-sched.c:1081
[< inline >] __run_hrtimer kernel/time/hrtimer.c:1223
[<ffffffff81110e87>] __hrtimer_run_queues+0x237/0x450 kernel/time/hrtimer.c:1287
[<ffffffff811296c0>] ? tick_sched_do_timer+0x70/0x70 kernel/time/tick-sched.c:132
[<ffffffff8111167e>] hrtimer_interrupt+0xfe/0x290 kernel/time/hrtimer.c:1321
[<ffffffff8105d9b7>] local_apic_timer_interrupt+0x77/0xc0 arch/x86/kernel/apic/apic.c:890
[<ffffffff8105ddce>] smp_trace_apic_timer_interrupt+0x8e/0x1a0 arch/x86/kernel/apic/apic.c:934
[< inline >] ? __raw_read_unlock include/linux/rwlock_api_smp.h:254
[<ffffffff8108ed24>] ? do_wait+0x324/0x3f0 kernel/exit.c:1499
[<ffffffff81ea992a>] trace_apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790
<EOI> [< inline >] ? __raw_read_unlock include/linux/rwlock_api_smp.h:254
<EOI> [<ffffffff8108ed24>] ? do_wait+0x324/0x3f0 kernel/exit.c:1499
[< inline >] ? arch_local_irq_restore ./arch/x86/include/asm/paravirt.h:836
[<ffffffff8124bb97>] ? ktsan_mtx_post_unlock+0x97/0xc0 mm/ktsan/ktsan.c:468
[< inline >] ? __raw_read_unlock include/linux/rwlock_api_smp.h:254
[<ffffffff8108ed24>] ? do_wait+0x324/0x3f0 kernel/exit.c:1499
[< inline >] ? SYSC_wait4 kernel/exit.c:1615
[<ffffffff8108f335>] ? SyS_wait4+0xe5/0x160 kernel/exit.c:1584
[<ffffffff8108b070>] ? task_stopped_code+0x90/0x90 kernel/exit.c:1127
[<ffffffff81ea89d1>] ? entry_SYSCALL_64_fastpath+0x31/0x95 arch/x86/entry/entry_64.S:188
rcu_sched kthread starved for 21024 jiffies! g12860 c12859 f0x2
INFO: rcu_sched detected stalls on CPUs/tasks:
(detected by 1, t=84008 jiffies, g=12860, c=12859, q=478)
All QSes seen, last rcu_sched kthread activity 84008 (4295184534-4295100526), jiffies_till_next_fqs=3, root ->qsmask 0x0
trinity-c5 R running task 12960 3846 3564 0x00000000
0000000000000000 00000000000032a0 ffff88028df547c0 ffff88063fc83d08
ffffffff810c8aa0 000000010002086e 0000000100035096 0000000000000000
ffffffff82440e80 ffff88063fc83dd8 ffffffff8110839f 0000000000000000
Call Trace:
<IRQ> [<ffffffff810c8aa0>] sched_show_task+0xf0/0x150 kernel/sched/core.c:4837
[< inline >] print_other_cpu_stall kernel/rcu/tree.c:1272
[< inline >] check_cpu_stall kernel/rcu/tree.c:1378
[< inline >] __rcu_pending kernel/rcu/tree.c:3474
[< inline >] rcu_pending kernel/rcu/tree.c:3538
[<ffffffff8110839f>] rcu_check_callbacks+0xc7f/0xca0 kernel/rcu/tree.c:2760
[<ffffffff81251e60>] ? kt_sync_acquire+0x90/0x160 mm/ktsan/sync.c:93
[<ffffffff8110f8de>] update_process_times+0x3e/0x70 kernel/time/timer.c:1397
[<ffffffff81128b1e>] tick_sched_handle.isra.15+0x7e/0xa0 kernel/time/tick-sched.c:151
[<ffffffff8112971e>] tick_sched_timer+0x5e/0xb0 kernel/time/tick-sched.c:1081
[< inline >] __run_hrtimer kernel/time/hrtimer.c:1223
[<ffffffff81110e87>] __hrtimer_run_queues+0x237/0x450 kernel/time/hrtimer.c:1287
[<ffffffff811296c0>] ? tick_sched_do_timer+0x70/0x70 kernel/time/tick-sched.c:132
[<ffffffff8111167e>] hrtimer_interrupt+0xfe/0x290 kernel/time/hrtimer.c:1321
[<ffffffff8105d9b7>] local_apic_timer_interrupt+0x77/0xc0 arch/x86/kernel/apic/apic.c:890
[<ffffffff8105ddce>] smp_trace_apic_timer_interrupt+0x8e/0x1a0 arch/x86/kernel/apic/apic.c:934
[<ffffffff81ea992a>] trace_apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790
<EOI> [<ffffffff8124e72e>] ? kt_spin_unlock+0xe/0x20 mm/ktsan/spinlock.c:20
[< inline >] ? arch_local_irq_restore ./arch/x86/include/asm/paravirt.h:836
[< inline >] ? __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:179
[<ffffffff81ea7fac>] ? _raw_spin_unlock_irqrestore+0x4c/0x60 kernel/locking/spinlock.c:191
[< inline >] spin_unlock_irqrestore include/linux/spinlock.h:372
[<ffffffff810da76f>] add_wait_queue+0x3f/0x50 kernel/sched/wait.c:30
[<ffffffff8108eabc>] do_wait+0xbc/0x3f0 kernel/exit.c:1471
[< inline >] SYSC_wait4 kernel/exit.c:1615
[<ffffffff8108f335>] SyS_wait4+0xe5/0x160 kernel/exit.c:1584
[<ffffffff8108b070>] ? task_stopped_code+0x90/0x90 kernel/exit.c:1127
[<ffffffff81ea89d1>] entry_SYSCALL_64_fastpath+0x31/0x95 arch/x86/entry/entry_64.S:188
rcu_sched kthread starved for 84029 jiffies! g12860 c12859 f0x2
INFO: rcu_sched detected stalls on CPUs/tasks:
(detected by 3, t=21002 jiffies, g=8535, c=8534, q=749)
All QSes seen, last rcu_sched kthread activity 21000 (4295905729-4295884729), jiffies_till_next_fqs=3, r
oot ->qsmask 0x0
swapper/3 R running task 14800 0 1 0x00000000
0000000000000000 00000000000039d0 ffff88028e734180 ffff88063fd83d18
ffffffff810c8aa0 00000001000dffb9 00000001000e51c1 0000000000000000
ffffffff82440e80 ffff88063fd83de8 ffffffff8110839f 0000000000000000
Call Trace:
<IRQ> [<ffffffff810c8aa0>] sched_show_task+0xf0/0x150 kernel/sched/core.c:4837
[< inline >] print_other_cpu_stall kernel/rcu/tree.c:1272
[< inline >] check_cpu_stall kernel/rcu/tree.c:1378
[< inline >] __rcu_pending kernel/rcu/tree.c:3474
[< inline >] rcu_pending kernel/rcu/tree.c:3538
[<ffffffff8110839f>] rcu_check_callbacks+0xc7f/0xca0 kernel/rcu/tree.c:2760
[<ffffffff8110f8de>] update_process_times+0x3e/0x70 kernel/time/timer.c:1397
[<ffffffff81128b1e>] tick_sched_handle.isra.15+0x7e/0xa0 kernel/time/tick-sched.c:151
[<ffffffff8112971e>] tick_sched_timer+0x5e/0xb0 kernel/time/tick-sched.c:1081
[< inline >] __run_hrtimer kernel/time/hrtimer.c:1223
[<ffffffff81110e87>] __hrtimer_run_queues+0x237/0x450 kernel/time/hrtimer.c:1287
[<ffffffff811296c0>] ? tick_sched_do_timer+0x70/0x70 kernel/time/tick-sched.c:132
[<ffffffff8111167e>] hrtimer_interrupt+0xfe/0x290 kernel/time/hrtimer.c:1321
[<ffffffff8105d9b7>] local_apic_timer_interrupt+0x77/0xc0 arch/x86/kernel/apic/apic.c:890
[<ffffffff8105dd06>] smp_apic_timer_interrupt+0x76/0xb0 arch/x86/kernel/apic/apic.c:914
[<ffffffff81eaa30a>] apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790
<EOI> [<ffffffff8106fd36>] ? native_safe_halt+0x6/0x10 ./arch/x86/include/asm/irqflags.h:49
[< inline >] arch_safe_halt ./arch/x86/include/asm/paravirt.h:111
[<ffffffff81015c51>] default_idle+0x41/0x160 arch/x86/kernel/process.c:301
[<ffffffff8101679f>] arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:292
[<ffffffff810db233>] default_idle_call+0x43/0x60 kernel/sched/idle.c:89
[< inline >] cpuidle_idle_call kernel/sched/idle.c:157
[< inline >] cpu_idle_loop kernel/sched/idle.c:253
[<ffffffff810db665>] cpu_startup_entry+0x375/0x3e0 kernel/sched/idle.c:301
[<ffffffff8105a8ff>] start_secondary+0x22f/0x240 arch/x86/kernel/smpboot.c:261
I wonder why it hangs in safe_halt... did we forgot to enable interrupts somewhere in runtime?
Been experiencing this since kernels 4.* rolling back to 3.19 and it's gone.
@cirrusUK can you please elaborate? What's been rolling back? And what's done?
@cirrusUK Are you using CONFIG_KTSAN? Did you backport it to 3.19?
dvyuko Humble apologies, it would appear i posted in wrong thread, i will close/delete the issue. I'm sorry for the noise.