liyi-ibm / linux

Linux kernel source tree
Other
0 stars 1 forks source link

khugepaged deadlock #17

Open liyi-ibm opened 5 years ago

liyi-ibm commented 5 years ago

see bug: 169701

The rcu stall is only report on one cpu and it looks a deadlock on khugepaged's __split_huge_pmd(). 
[Thu Jul 12 09:20:19 2018]      173-...: (1 GPs behind) idle=04e/140000000000001/0 softirq=31330298/31330300 fqs=704364
[Thu Jul 12 09:20:19 2018]      (detected by 170, t=1446404 jiffies, g=12950989, c=12950988, q=16164202)
[Thu Jul 12 09:20:19 2018] Sending NMI from CPU 170 to CPUs 173:
[Thu Jul 12 09:20:19 2018] NMI backtrace for cpu 173
[Thu Jul 12 09:20:19 2018] CPU: 173 PID: 900 Comm: khugepaged Tainted: G      D         4.14.49-1 #1
[Thu Jul 12 09:20:19 2018] task: c000003fe138a800 task.stack: c000003fe1420000
[Thu Jul 12 09:20:19 2018] NIP:  c000000000aebc98 LR: c000000000331f34 CTR: c0000000002e52c0
[Thu Jul 12 09:20:19 2018] REGS: c000003fe1422f60 TRAP: 0e81   Tainted: G      D          (4.14.49-1)
[Thu Jul 12 09:20:19 2018] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 22444944  XER: 00000000
[Thu Jul 12 09:20:19 2018] CFAR: c000000000aebcb0 SOFTE: 1
GPR00: c000000000331f34 c000003fe14231e0 c0000000013d3000 c0002002af580064
GPR04: ffffffffffe00000 00007fff4d000000 0000000000000001 c00a000800bcf800
GPR08: 0000000000000001 000000008000008c 0000000000000000 00000000000001ff
GPR12: 0000000042444944 c000000007d56f00
[Thu Jul 12 09:20:19 2018] NIP [c000000000aebc98] _raw_spin_lock+0x68/0xc0
[Thu Jul 12 09:20:19 2018] LR [c000000000331f34] __split_huge_pmd+0xb4/0x1120
[Thu Jul 12 09:20:19 2018] Call Trace:
[Thu Jul 12 09:20:19 2018] [c000003fe14231e0] [c000003fe14235d0] 0xc000003fe14235d0 (unreliable)
[Thu Jul 12 09:20:19 2018] [c000003fe1423210] [c000000000331f34] __split_huge_pmd+0xb4/0x1120
[Thu Jul 12 09:20:19 2018] [c000003fe14232e0] [c0000000002e5a64] try_to_unmap_one+0x7a4/0x9c0
[Thu Jul 12 09:20:19 2018] [c000003fe14233f0] [c0000000002e3df4] rmap_walk_anon+0x1b4/0x3f0
[Thu Jul 12 09:20:19 2018] [c000003fe1423460] [c0000000002e6f64] try_to_unmap+0xb4/0x1a0
[Thu Jul 12 09:20:19 2018] [c000003fe14234c0] [c000000000335204] split_huge_page_to_list+0x184/0xca0
[Thu Jul 12 09:20:19 2018] [c000003fe14235c0] [c000000000335f60] deferred_split_scan+0x240/0x390
[Thu Jul 12 09:20:19 2018] [c000003fe1423650] [c0000000002976e0] shrink_slab+0x2d0/0x520
[Thu Jul 12 09:20:19 2018] [c000003fe14237a0] [c00000000029d564] shrink_node+0x2c4/0x410
[Thu Jul 12 09:20:19 2018] [c000003fe1423860] [c00000000029db78] do_try_to_free_pages+0x128/0x4b0
[Thu Jul 12 09:20:19 2018] [c000003fe1423900] [c00000000029e02c] try_to_free_pages+0x12c/0x2b0
[Thu Jul 12 09:20:19 2018] [c000003fe1423990] [c0000000002845e4] __alloc_pages_nodemask+0x714/0x1080
[Thu Jul 12 09:20:19 2018] [c000003fe1423b80] [c0000000003382ac] khugepaged_alloc_page+0x8c/0x140
[Thu Jul 12 09:20:19 2018] [c000003fe1423bb0] [c00000000033a7ec] khugepaged+0x9dc/0x2b60
[Thu Jul 12 09:20:19 2018] [c000003fe1423dc0] [c000000000128aa8] kthread+0x168/0x1b0
[Thu Jul 12 09:20:19 2018] [c000003fe1423e30] [c00000000000bdd0] ret_from_kernel_thread+0x5c/0x8c
[Thu Jul 12 09:20:19 2018] Instruction dump:
[Thu Jul 12 09:20:19 2018] 40c20010 7d40192d 40c2fff0 7c2004ac 2fa90000 40de0018 38210030 e8010010
[Thu Jul 12 09:20:19 2018] ebe1fff8 7c0803a6 4e800020 7c210b78 <e92d0000> 89290009 792affe3 4082003c

The system is still reponsive but we cannot access one disk /data3 in the system(maybe due to khugepaged holds some file system mutex). khugepaged occupies 100% CPU.
liyi-ibm commented 5 years ago

"commit 675d995297d42f69484100516cd30a71d25f4c7c Author: Aneesh Kumar K.V aneesh.kumar@linux.vnet.ibm.com Date: Mon Apr 16 16:57:24 2018 +0530

powerpc/book3s64: Enable split pmd ptlock."

possible fix.