open-power-host-os / linux

Linux kernel source tree
Other
3 stars 4 forks source link

Memory hotplug/hotunplug continuously hit with Call Trace in the VM #19

Closed MalleshKoti closed 6 years ago

MalleshKoti commented 6 years ago
Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=160740 ---ISSUE--- Hotplugging and Hotunplugging continuously giving continuous Call Trace inside the guest. [ 271.020588] WARNING: CPU: 3 PID: 6 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x38/0x1a0 [ 271.021669] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables virtio_balloon virtio_net virtio_scsi [ 271.026247] CPU: 3 PID: 6 Comm: kworker/u64:0 Tainted: G B W 4.13.0-4.rel.git49564cb.el7.centos.ppc64le #1 [ 271.027489] Workqueue: pseries hotplug workque pseries_hp_work_fn [ 271.028204] task: c000000e37444200 task.stack: c000000e3748c000 [ 271.028901] NIP: c0000000000675d8 LR: c00000000007354c CTR: 0000000000000000 [ 271.029720] REGS: c000000e3748f4f0 TRAP: 0700 Tainted: G B W (4.13.0-4.rel.git49564cb.el7.centos.ppc64le) [ 271.030990] MSR: 800000000282b033 [ 271.030995] CR: 48002044 XER: 00000000 [ 271.032204] CFAR: c000000000073548 SOFTE: 1 GPR00: c00000000007354c c000000e3748f770 c000000001397a00 c000000001341c50 GPR04: 0000000000000001 c000001cc00010f8 8e01e0ff1c000080 8e01e0ff1c000080 GPR08: 000000008000001c 06000000000000c0 8e01e0ff1c0000c0 0000000000000003 GPR12: 0000000000002000 c00000000fd81e00 c000000000124348 c000000e3b220e40 GPR16: 0000000000000000 0000000000000010 c000001d3fffec28 0000000000000000 GPR20: c000001d3ffcf800 c0000000015465f0 c00000000153ad58 c000001d3ffff000 GPR24: c000001cc0000100 c000001cffe00000 800000000000018e 0000000000000ff8 GPR28: c00000000153ad68 0000000000200000 c000000001341c50 c000001cffe00000 [ 271.041048] NIP [c0000000000675d8] set_pte_at+0x38/0x1a0 [ 271.041733] LR [c00000000007354c] radix__map_kernel_page+0x27c/0x670 [ 271.042559] Call Trace: [ 271.042875] [c000000e3748f770] [c000000e3748f7b0] 0xc000000e3748f7b0 (unreliable) [ 271.043842] [c000000e3748f790] [c00000000007354c] radix__map_kernel_page+0x27c/0x670 [ 271.044845] [c000000e3748f800] [c000000000ae9aa4] create_physical_mapping+0x188/0x20c [ 271.045861] [c000000e3748f8a0] [c000000000072334] create_section_mapping+0x24/0x60 [ 271.046843] [c000000e3748f8c0] [c000000000067108] arch_add_memory+0x78/0xf0 [ 271.047757] [c000000e3748f950] [c0000000003223cc] add_memory_resource+0x15c/0x2c0 [ 271.048734] [c000000e3748f9e0] [c0000000003225fc] add_memory+0xcc/0x1d0 [ 271.049598] [c000000e3748fa60] [c0000000000be7b8] dlpar_add_lmb+0x248/0x420 [ 271.050501] [c000000e3748fb40] [c0000000000bfcc0] dlpar_memory+0xc80/0xd80 [ 271.051394] [c000000e3748fbf0] [c0000000000b7638] handle_dlpar_errorlog+0xf8/0x160 [ 271.052373] [c000000e3748fc60] [c0000000000b7734] pseries_hp_work_fn+0x94/0xa0 [ 271.053314] [c000000e3748fc90] [c00000000011bc00] process_one_work+0x1a0/0x490 [ 271.054251] [c000000e3748fd30] [c00000000011bf88] worker_thread+0x98/0x520 [ 271.055140] [c000000e3748fdc0] [c0000000001244a8] kthread+0x168/0x1b0 [ 271.055976] [c000000e3748fe30] [c00000000000bc60] ret_from_kernel_thread+0x5c/0x7c [ 271.056955] Instruction dump: [ 271.057343] 7c0802a6 f8010010 f821ffe1 e9450000 7944cfe3 41820024 3d200700 792907c6 [ 271.058350] 612900c0 7d494838 2ba900c0 419e000c <0fe00000> 60420000 78c70022 54ca403e [ 271.059379] ---[ end trace 6da919e9ea1c5e99 ]--- ---Steps to reproduce--- 1. Boot in to guest with 2 numa as: ``` ``` 2. Continuously hotplug memory to numa 0 for 4 times ``` 12582912 0 ``` 3. Continuously hotplug memory to numa 1 for 5 times ``` 12582912 1 ``` 4. Try to hotunplug, hotunplug may fail 5. Reboot guest. 6. Now try to hotunplug memory continuously from numa 1 for 5 times - hotunplug works fine. 7. When you try hotunplug from numa 1 for 6th time - it hits with continuous Call Trace inside the vm.
cdeadmin commented 6 years ago

------- Comment (attachment only) From magadagi@in.ibm.com 2017-10-30 03:27:26 EDT-------

cdeadmin commented 6 years ago

------- Comment From viparash@in.ibm.com 2017-11-08 07:14:15 EDT------- [ 271.019914] ------------[ cut here ]------------ [ 271.020588] WARNING: CPU: 3 PID: 6 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x38/0x1a0 [ 271.021669] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables virtio_balloon virtio_net virtio_scsi [ 271.026247] CPU: 3 PID: 6 Comm: kworker/u64:0 Tainted: G B W 4.13.0-4.rel.git49564cb.el7.centos.ppc64le #1 [ 271.027489] Workqueue: pseries hotplug workque pseries_hp_work_fn [ 271.028204] task: c000000e37444200 task.stack: c000000e3748c000 [ 271.028901] NIP: c0000000000675d8 LR: c00000000007354c CTR: 0000000000000000 [ 271.029720] REGS: c000000e3748f4f0 TRAP: 0700 Tainted: G B W (4.13.0-4.rel.git49564cb.el7.centos.ppc64le) [ 271.030990] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> [ 271.030995] CR: 48002044 XER: 00000000 [ 271.032204] CFAR: c000000000073548 SOFTE: 1 GPR00: c00000000007354c c000000e3748f770 c000000001397a00 c000000001341c50 GPR04: 0000000000000001 c000001cc00010f8 8e01e0ff1c000080 8e01e0ff1c000080 GPR08: 000000008000001c 06000000000000c0 8e01e0ff1c0000c0 0000000000000003 GPR12: 0000000000002000 c00000000fd81e00 c000000000124348 c000000e3b220e40 GPR16: 0000000000000000 0000000000000010 c000001d3fffec28 0000000000000000 GPR20: c000001d3ffcf800 c0000000015465f0 c00000000153ad58 c000001d3ffff000 GPR24: c000001cc0000100 c000001cffe00000 800000000000018e 0000000000000ff8 GPR28: c00000000153ad68 0000000000200000 c000000001341c50 c000001cffe00000 [ 271.041048] NIP [c0000000000675d8] set_pte_at+0x38/0x1a0 [ 271.041733] LR [c00000000007354c] radixmap_kernel_page+0x27c/0x670 [ 271.042559] Call Trace: [ 271.042875] [c000000e3748f770] [c000000e3748f7b0] 0xc000000e3748f7b0 (unreliable) [ 271.043842] [c000000e3748f790] [c00000000007354c] radixmap_kernel_page+0x27c/0x670 [ 271.044845] [c000000e3748f800] [c000000000ae9aa4] create_physical_mapping+0x188/0x20c [ 271.045861] [c000000e3748f8a0] [c000000000072334] create_section_mapping+0x24/0x60 [ 271.046843] [c000000e3748f8c0] [c000000000067108] arch_add_memory+0x78/0xf0 [ 271.047757] [c000000e3748f950] [c0000000003223cc] add_memory_resource+0x15c/0x2c0 [ 271.048734] [c000000e3748f9e0] [c0000000003225fc] add_memory+0xcc/0x1d0 [ 271.049598] [c000000e3748fa60] [c0000000000be7b8] dlpar_add_lmb+0x248/0x420 [ 271.050501] [c000000e3748fb40] [c0000000000bfcc0] dlpar_memory+0xc80/0xd80 [ 271.051394] [c000000e3748fbf0] [c0000000000b7638] handle_dlpar_errorlog+0xf8/0x160 [ 271.052373] [c000000e3748fc60] [c0000000000b7734] pseries_hp_work_fn+0x94/0xa0 [ 271.053314] [c000000e3748fc90] [c00000000011bc00] process_one_work+0x1a0/0x490 [ 271.054251] [c000000e3748fd30] [c00000000011bf88] worker_thread+0x98/0x520 [ 271.055140] [c000000e3748fdc0] [c0000000001244a8] kthread+0x168/0x1b0 [ 271.055976] [c000000e3748fe30] [c00000000000bc60] ret_from_kernel_thread+0x5c/0x7c [ 271.056955] Instruction dump: [ 271.057343] 7c0802a6 f8010010 f821ffe1 e9450000 7944cfe3 41820024 3d200700 792907c6 [ 271.058350] 612900c0 7d494838 2ba900c0 419e000c <0fe00000> 60420000 78c70022 54ca403e [ 271.059379] ---[ end trace 6da919e9ea1c5e99 ]---

Linus log is filled with above trace. This trace is coming from arch/powerpc/mm/pgtable.c:194 set_pte_at()

/*

$ git blame arch/powerpc/mm/pgtable.c | grep VM_WARN_ON c7d54842 (Aneesh Kumar K.V 2016-04-29 23:25:30 +1000 194) VM_WARN_ON(pte_present(ptep) && !pte_protnone(ptep)); $

This WARN_ON was added by commit c7d54842de in kernel version 4.7

c7d54842de -- powerpc/mm: Use _PAGE_READ to indicate Read access

cdeadmin commented 6 years ago

------- Comment From seg@us.ibm.com 2018-09-14 12:47:45 EDT------- No longer making plans for future hostos-specific bugs.