liyi-ibm / linux

Linux kernel source tree
Other
0 stars 1 forks source link

cpu hardlock caused by oom output in console #14

Open liyi-ibm opened 5 years ago

liyi-ibm commented 5 years ago
The overnight run hit one lockup. But seems it was still caused by memory issues(OOM).
[Wed Dec 12 01:08:56 2018] Watchdog CPU:137 Hard LOCKUP
[Wed Dec 12 01:08:56 2018] Call Trace:
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f300] [c00000000009b1a0] opal_put_chars+0x190/0x2d0 (unreliable)
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f370] [c00000000066c340] hvc_console_print+0x160/0x1e0
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f400] [c000000000173c8c] console_unlock+0x5ac/0x7a0
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f4f0] [c00000000017419c] vprintk_emit+0x31c/0x400
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f560] [c000000000175ed4] vprintk_func+0x64/0xd0
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f580] [c000000000175628] printk+0x40/0x54
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f5a0] [c000000000287264] dump_header+0x220/0x28c
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f680] [c000000000285f1c] oom_kill_process+0x50c/0x610
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f740] [c000000000286a84] out_of_memory+0x154/0x670
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f7e0] [c00000000028e850] __alloc_pages_nodemask+0xe80/0x1080
[Wed Dec 12 01:08:56 2018] [c0002019a0f5f9d0] [c000000000317140] alloc_pages_current+0xa0/0x130
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fa10] [c00000000027c5a8] __page_cache_alloc+0xf8/0x170
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fa40] [c00000000027db10] pagecache_get_page+0xc0/0x3c0
[Wed Dec 12 01:08:56 2018] [c0002019a0f5faa0] [c00000000027f538] grab_cache_page_write_begin+0x38/0x60
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fad0] [c000000000458c54] ext4_da_write_begin+0x114/0x550
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fb90] [c00000000027f7a8] generic_perform_write+0xf8/0x260
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fc20] [c000000000282be0] __generic_file_write_iter+0x200/0x240
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fc80] [c00000000044129c] ext4_file_write_iter+0x17c/0x490
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fd00] [c000000000369dc0] __vfs_write+0x130/0x1f0
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fd90] [c00000000036a0b0] vfs_write+0xd0/0x240
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fde0] [c00000000036a3f8] SyS_write+0x68/0x110
[Wed Dec 12 01:08:56 2018] [c0002019a0f5fe30] [c00000000000bfec] system_call+0x58/0x6c
liyi-ibm commented 5 years ago

N: That hard lockup is a latency issue in the powerpc console driver code. The OOM causes a lot of characters to be printed to console, which causes these high interrupt latencies.

We've fixed that upstream. There's a bunch of patches we could provide if Tencent wants to fix this, but during normal operation it should not be much of a problem.