liyi-ibm / linux

Linux kernel source tree
Other
0 stars 1 forks source link

rcu stalls #10

Open liyi-ibm opened 5 years ago

liyi-ibm commented 5 years ago

There is rcu stalls on P9, which causes system reboot.

Dec  6 14:47:07 tdw-9-10-25-239 kernel: NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 3 timed out
Dec  6 14:47:07 tdw-9-10-25-239 kernel: ------------[ cut here ]------------
Dec  6 14:47:07 tdw-9-10-25-239 kernel: WARNING: CPU: 136 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x35c/0x370
Dec  6 14:47:07 tdw-9-10-25-239 kernel: Modules linked in: dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag i2c_dev joydev ixgbe ptp at24 pps_core mdio ofpart opal_prd powernv_flash ipmi_powernv ipmi_devintf ipmi_msghandler mtd i2c_opal nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc binfmt_misc usb_storage ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mpt3sas drm raid_class scsi_transport_sas i2c_core
Dec  6 14:47:07 tdw-9-10-25-239 kernel: CPU: 136 PID: 0 Comm: swapper/136 Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:47:07 tdw-9-10-25-239 kernel: task: c000201cb7440000 task.stack: c000201cb74cc000
Dec  6 14:47:07 tdw-9-10-25-239 kernel: NIP:  c000000000989d8c LR: c000000000989d88 CTR: 0000000000000000
Dec  6 14:47:07 tdw-9-10-25-239 kernel: REGS: c000201cb74cf4d0 TRAP: 0700   Tainted: G        W        (4.14.49-3.ppc64le)
Dec  6 14:47:07 tdw-9-10-25-239 kernel: MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004822  XER: 20040000
Dec  6 14:47:07 tdw-9-10-25-239 kernel: CFAR: c000000000172f18 SOFTE: 1 #012GPR00: c000000000989d88 c000201cb74cf750 c0000000013d3000 0000000000000039 #012GPR04: c000201cc755abd0 c000201cc7571410 0000000000000001 c000201cc6b50000 #012GPR08: 0000000000000000 c000000000f3126c 0000201cc6630000 0000000000000006 #012GPR12: 0000000000004000 c000000007d9d800 c000201cb74cff90 0000000000000000 #012GPR16: 0000000000200042 0000000112ea5201 c000201cb74cc000 0000000000000000 #012GPR20: c000000000f44f80 c000000001403b00 c000000000f44f80 000000000000000a #012GPR24: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000088 #012GPR28: 0000000000000004 c000000001403b00 c000201ca04c0000 0000000000000003 
Dec  6 14:47:07 tdw-9-10-25-239 kernel: NIP [c000000000989d8c] dev_watchdog+0x35c/0x370
Dec  6 14:47:07 tdw-9-10-25-239 kernel: LR [c000000000989d88] dev_watchdog+0x358/0x370
Dec  6 14:47:07 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf750] [c000000000989d88] dev_watchdog+0x358/0x370 (unreliable)
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf7f0] [c000000000193bc0] call_timer_fn+0x60/0x1d0
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf880] [c000000000193eb0] expire_timers+0x140/0x1e0
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf8f0] [c000000000194028] run_timer_softirq+0xd8/0x230
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf980] [c000000000aec96c] __do_softirq+0x15c/0x3a4
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfa70] [c000000000104288] irq_exit+0x118/0x130
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfa90] [c000000000023d6c] timer_interrupt+0xac/0xe0
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfac0] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:47:07 tdw-9-10-25-239 kernel: --- interrupt: 901 at replay_interrupt_return+0x0/0x4#012    LR = arch_local_irq_restore+0x74/0x90
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfdb0] [c000201cb74cfe30] 0xc000201cb74cfe30 (unreliable)
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfdd0] [c0000000008d0e10] cpuidle_enter_state+0x110/0x3f0
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfe30] [c00000000015bd3c] call_cpuidle+0x4c/0x80
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfe50] [c00000000015c130] do_idle+0x2b0/0x350
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfec0] [c00000000015c3b8] cpu_startup_entry+0x38/0x40
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfef0] [c000000000048894] start_secondary+0x4e4/0x530
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cff90] [c00000000000b26c] start_secondary_prolog+0x10/0x14
Dec  6 14:47:07 tdw-9-10-25-239 kernel: Instruction dump:
Dec  6 14:47:07 tdw-9-10-25-239 kernel: 3d02fff3 7fc3f378 99282650 4bfc6171 60000000 7fc4f378 7fe6fb78 7c651b78 
Dec  6 14:47:07 tdw-9-10-25-239 kernel: 3c62ff9e 3863f838 4b7e914d 60000000 <0fe00000> 4bffff84 60000000 60000000 
Dec  6 14:47:07 tdw-9-10-25-239 kernel: ---[ end trace 823c30e96f862b4a ]---
Dec  6 14:47:07 tdw-9-10-25-239 kernel: ixgbe 0034:01:00.1 eth1: initiating reset due to tx timeout
Dec  6 14:47:07 tdw-9-10-25-239 kernel: ixgbe 0034:01:00.1 eth1: Reset adapter
Dec  6 14:47:50 tdw-9-10-25-239 kernel: INFO: rcu_sched self-detected stall on CPU
Dec  6 14:47:50 tdw-9-10-25-239 kernel: #01188-...: (6001 ticks this GP) idle=dde/140000000000001/0 softirq=66975497/66975497 fqs=2973 
Dec  6 14:47:50 tdw-9-10-25-239 kernel: #011 (t=6001 jiffies g=68660684 c=68660683 q=26406)
Dec  6 14:47:50 tdw-9-10-25-239 kernel: NMI backtrace for cpu 88
Dec  6 14:47:50 tdw-9-10-25-239 kernel: CPU: 88 PID: 95334 Comm: drop_cache.sh Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:47:50 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff2d0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff310] [c000000000ad4ac4] nmi_cpu_backtrace+0x1a4/0x210
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff3a0] [c000000000ad4d0c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff440] [c00000000002e5b8] arch_trigger_cpumask_backtrace+0x28/0x40
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff460] [c00000000018a9b4] rcu_dump_cpu_stacks+0xfc/0x158
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff4b0] [c000000000189df8] rcu_check_callbacks+0x898/0xaa0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff5e0] [c000000000195334] update_process_times+0x44/0x90
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff610] [c0000000001abf4c] tick_sched_handle.isra.13+0x4c/0x80
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff630] [c0000000001abfe0] tick_sched_timer+0x60/0xc0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff670] [c000000000195f38] __hrtimer_run_queues+0xf8/0x330
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff6f0] [c000000000196cfc] hrtimer_interrupt+0xec/0x290
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff7b0] [c000000000023668] __timer_interrupt+0x98/0x280
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff800] [c000000000023d68] timer_interrupt+0xa8/0xe0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff830] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:47:50 tdw-9-10-25-239 kernel: --- interrupt: 901 at _raw_spin_lock+0x40/0xc0#012    LR = drop_pagecache_sb+0xac/0x1d0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffb20] [0000000000000000]           (null) (unreliable)
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffb50] [c0000000003e96bc] drop_pagecache_sb+0xac/0x1d0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffbb0] [c000000000363898] iterate_supers+0x1b8/0x1f0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffc20] [c0000000003e9890] drop_caches_sysctl_handler+0xb0/0x170
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffc90] [c00000000040e608] proc_sys_call_handler+0x108/0x130
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffd00] [c00000000035ec98] __vfs_write+0x48/0x1f0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffd90] [c00000000035f070] vfs_write+0xd0/0x240
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffde0] [c00000000035f3b8] SyS_write+0x68/0x110
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffe30] [c00000000000b9e0] system_call+0x58/0x6c
Dec  6 14:50:50 tdw-9-10-25-239 kernel: INFO: rcu_sched self-detected stall on CPU
Dec  6 14:50:50 tdw-9-10-25-239 kernel: #01188-...: (24004 ticks this GP) idle=dde/140000000000001/0 softirq=66975497/66975497 fqs=11973 
Dec  6 14:50:50 tdw-9-10-25-239 kernel: #011 (t=24004 jiffies g=68660684 c=68660683 q=29221)
Dec  6 14:50:50 tdw-9-10-25-239 kernel: NMI backtrace for cpu 88
Dec  6 14:50:50 tdw-9-10-25-239 kernel: CPU: 88 PID: 95334 Comm: drop_cache.sh Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:50:50 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff2d0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff310] [c000000000ad4ac4] nmi_cpu_backtrace+0x1a4/0x210
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff3a0] [c000000000ad4d0c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff440] [c00000000002e5b8] arch_trigger_cpumask_backtrace+0x28/0x40
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff460] [c00000000018a9b4] rcu_dump_cpu_stacks+0xfc/0x158
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff4b0] [c000000000189df8] rcu_check_callbacks+0x898/0xaa0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff5e0] [c000000000195334] update_process_times+0x44/0x90
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff610] [c0000000001abf4c] tick_sched_handle.isra.13+0x4c/0x80
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff630] [c0000000001abfe0] tick_sched_timer+0x60/0xc0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff670] [c000000000195f38] __hrtimer_run_queues+0xf8/0x330
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff6f0] [c000000000196cfc] hrtimer_interrupt+0xec/0x290
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff7b0] [c000000000023668] __timer_interrupt+0x98/0x280
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff800] [c000000000023d68] timer_interrupt+0xa8/0xe0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff830] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:50:50 tdw-9-10-25-239 kernel: --- interrupt: 901 at _raw_spin_lock+0x48/0xc0#012    LR = drop_pagecache_sb+0xac/0x1d0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffb20] [0000000000000000]           (null) (unreliable)
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffb50] [c0000000003e96bc] drop_pagecache_sb+0xac/0x1d0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffbb0] [c000000000363898] iterate_supers+0x1b8/0x1f0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffc20] [c0000000003e9890] drop_caches_sysctl_handler+0xb0/0x170
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffc90] [c00000000040e608] proc_sys_call_handler+0x108/0x130
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffd00] [c00000000035ec98] __vfs_write+0x48/0x1f0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffd90] [c00000000035f070] vfs_write+0xd0/0x240
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffde0] [c00000000035f3b8] SyS_write+0x68/0x110
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffe30] [c00000000000b9e0] system_call+0x58/0x6c
Dec  6 14:53:50 tdw-9-10-25-239 kernel: INFO: rcu_sched self-detected stall on CPU
Dec  6 14:53:50 tdw-9-10-25-239 kernel: #01188-...: (42008 ticks this GP) idle=dde/140000000000001/0 softirq=66975497/66975497 fqs=20924 
Dec  6 14:53:50 tdw-9-10-25-239 kernel: #011 (t=42008 jiffies g=68660684 c=68660683 q=31790)
Dec  6 14:53:50 tdw-9-10-25-239 kernel: NMI backtrace for cpu 88
Dec  6 14:53:50 tdw-9-10-25-239 kernel: CPU: 88 PID: 95334 Comm: drop_cache.sh Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:53:50 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff2d0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff310] [c000000000ad4ac4] nmi_cpu_backtrace+0x1a4/0x210
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff3a0] [c000000000ad4d0c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff440] [c00000000002e5b8] arch_trigger_cpumask_backtrace+0x28/0x40
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff460] [c00000000018a9b4] rcu_dump_cpu_stacks+0xfc/0x158
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff4b0] [c000000000189df8] rcu_check_callbacks+0x898/0xaa0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff5e0] [c000000000195334] update_process_times+0x44/0x90
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff610] [c0000000001abf4c] tick_sched_handle.isra.13+0x4c/0x80
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff630] [c0000000001abfe0] tick_sched_timer+0x60/0xc0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff670] [c000000000195f38] __hrtimer_run_queues+0xf8/0x330
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff6f0] [c000000000196cfc] hrtimer_interrupt+0xec/0x290
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff7b0] [c000000000023668] __timer_interrupt+0x98/0x280
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff800] [c000000000023d68] timer_interrupt+0xa8/0xe0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff830] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:53:50 tdw-9-10-25-239 kernel: --- interrupt: 901 at _raw_spin_lock+0x40/0xc0#012    LR = drop_pagecache_sb+0xac/0x1d0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffb20] [0000000000000000]           (null) (unreliable)
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffb50] [c0000000003e96bc] drop_pagecache_sb+0xac/0x1d0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffbb0] [c000000000363898] iterate_supers+0x1b8/0x1f0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffc20] [c0000000003e9890] drop_caches_sysctl_handler+0xb0/0x170
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffc90] [c00000000040e608] proc_sys_call_handler+0x108/0x130
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffd00] [c00000000035ec98] __vfs_write+0x48/0x1f0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffd90] [c00000000035f070] vfs_write+0xd0/0x240
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffde0] [c00000000035f3b8] SyS_write+0x68/0x110
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffe30] [c00000000000b9e0] system_call+0x58/0x6c
Dec  6 14:56:50 tdw-9-10-25-239 kernel: INFO: rcu_sched self-detected stall on CPU
Dec  6 14:56:50 tdw-9-10-25-239 kernel: #01188-...: (60012 ticks this GP) idle=dde/140000000000001/0 softirq=66975497/66975497 fqs=29914 
Dec  6 14:56:50 tdw-9-10-25-239 kernel: #011 (t=60012 jiffies g=68660684 c=68660683 q=39139)
Dec  6 14:56:50 tdw-9-10-25-239 kernel: NMI backtrace for cpu 88
Dec  6 14:56:50 tdw-9-10-25-239 kernel: CPU: 88 PID: 95334 Comm: drop_cache.sh Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:56:50 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff2d0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff310] [c000000000ad4ac4] nmi_cpu_backtrace+0x1a4/0x210
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff3a0] [c000000000ad4d0c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff440] [c00000000002e5b8] arch_trigger_cpumask_backtrace+0x28/0x40
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff460] [c00000000018a9b4] rcu_dump_cpu_stacks+0xfc/0x158
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff4b0] [c000000000189df8] rcu_check_callbacks+0x898/0xaa0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff5e0] [c000000000195334] update_process_times+0x44/0x90
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff610] [c0000000001abf4c] tick_sched_handle.isra.13+0x4c/0x80
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff630] [c0000000001abfe0] tick_sched_timer+0x60/0xc0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff670] [c000000000195f38] __hrtimer_run_queues+0xf8/0x330
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff6f0] [c000000000196cfc] hrtimer_interrupt+0xec/0x290
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff7b0] [c000000000023668] __timer_interrupt+0x98/0x280
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff800] [c000000000023d68] timer_interrupt+0xa8/0xe0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff830] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:56:50 tdw-9-10-25-239 kernel: --- interrupt: 901 at _raw_spin_lock+0x30/0xc0#012    LR = drop_pagecache_sb+0xac/0x1d0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffb20] [0000000000000000]           (null) (unreliable)
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffb50] [c0000000003e96bc] drop_pagecache_sb+0xac/0x1d0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffbb0] [c000000000363898] iterate_supers+0x1b8/0x1f0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffc20] [c0000000003e9890] drop_caches_sysctl_handler+0xb0/0x170
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffc90] [c00000000040e608] proc_sys_call_handler+0x108/0x130
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffd00] [c00000000035ec98] __vfs_write+0x48/0x1f0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffd90] [c00000000035f070] vfs_write+0xd0/0x240
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffde0] [c00000000035f3b8] SyS_write+0x68/0x110
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffe30] [c00000000000b9e0] system_call+0x58/0x6c
Dec  6 15:11:31 tdw-9-10-25-239 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="3731" x-info="http://www.rsyslog.com"] start
Dec  6 23:11:02 tdw-9-10-25-239 journal: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 127.3G available → current limit 4.0G).
Dec  6 23:11:02 tdw-9-10-25-239 journal: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 127.3G available → current limit 4.0G).
Dec  6 23:11:02 tdw-9-10-25-239 kernel: opal: OPAL detected !
liyi-ibm commented 5 years ago

The rcu_stall also happens on P8 kern.log.txt

liyi-ibm commented 5 years ago

This patch might fix: https://patchwork.kernel.org/patch/10716303/