open-power-host-os / linux

Linux kernel source tree
Other
3 stars 4 forks source link

Memory hotplug/hotunplug: Memory hotunplug fails with Kernel alert messages while stress running inside the guest. #23

Closed MalleshKoti closed 6 years ago

MalleshKoti commented 6 years ago
Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=160904 ---Issue--- Seeing below Kernel messages while trying to hotunplug memory with stress running inside the guest: [root@localhost ~]# dmesg -T --level=alert,crit,err,warn [Thu Nov 2 03:28:34 2017] crashkernel: memory value expected [Thu Nov 2 03:28:34 2017] Failed to allocate transformation for 'xts(aes)': -2 [Thu Nov 2 03:28:34 2017] alg: skcipher: Failed to load transform for p8_aes_xts: -2 [Thu Nov 2 03:28:34 2017] Warning: unable to open an initial console. [Thu Nov 2 03:28:34 2017] This architecture does not have kernel memory protection. [Thu Nov 2 03:28:34 2017] synth uevent: /devices/vio: failed to send uevent [Thu Nov 2 03:28:34 2017] vio vio: uevent: failed to send synthetic uevent [Thu Nov 2 03:28:36 2017] systemd: 25 output lines suppressed due to ratelimiting [Thu Nov 2 03:28:37 2017] synth uevent: /devices/vio: failed to send uevent [Thu Nov 2 03:28:37 2017] vio vio: uevent: failed to send synthetic uevent [Thu Nov 2 03:31:38 2017] failed to isolate pfn 2400c [Thu Nov 2 03:31:38 2017] raw: 01bffff000000000 0000000000000000 0000000000000000 00000011ffffffff [Thu Nov 2 03:31:38 2017] raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000 0000000000000000 [Thu Nov 2 03:31:38 2017] page dumped because: isolation failed [Thu Nov 2 03:31:38 2017] failed to isolate pfn 2400c [Thu Nov 2 03:31:38 2017] raw: 01bffff000000000 0000000000000000 0000000000000000 00000011ffffffff [Thu Nov 2 03:31:38 2017] raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000 0000000000000000 [Thu Nov 2 03:31:38 2017] page dumped because: isolation failed [Thu Nov 2 03:31:38 2017] failed to isolate pfn 2400c [Thu Nov 2 03:31:38 2017] raw: 01bffff000000000 0000000000000000 0000000000000000 00000011ffffffff [Thu Nov 2 03:31:38 2017] raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000 0000000000000000 [Thu Nov 2 03:31:38 2017] page dumped because: isolation failed [Thu Nov 2 03:31:38 2017] failed to isolate pfn 2400c [Thu Nov 2 03:31:38 2017] raw: 01bffff000000000 0000000000000000 0000000000000000 00000011ffffffff [Thu Nov 2 03:31:38 2017] raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000 0000000000000000 [Thu Nov 2 03:31:38 2017] page dumped because: isolation failed [Thu Nov 2 03:31:38 2017] failed to isolate pfn 2400c [Thu Nov 2 03:31:38 2017] raw: 01bffff000000000 0000000000000000 0000000000000000 00000011ffffffff [Thu Nov 2 03:31:38 2017] raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000 0000000000000000 [Thu Nov 2 03:31:38 2017] page dumped because: isolation failed [Thu Nov 2 03:31:38 2017] pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs ---Steps to recreate--- 1. Boot into the guest. 2. Start stress inside the guest as : "stress --cpu 10 --io 10 --vm 10 --vm-bytes 256M --vm-stride 4096 --vm-hang 10 --timeout 500s" 3. Hotplug memory to the guest 4. Try Hotunplug memory from the guest - I see following logs in VM's ssh session : ``` Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:page:c00a000000900300 count:17 mapcount:0 mapping: (null) index:0x0 Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:flags: 0x1bffff000000000() Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:page:c00a000000900300 count:17 mapcount:0 mapping: (null) index:0x0 Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:flags: 0x1bffff000000000() Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:page:c00a000000900300 count:17 mapcount:0 mapping: (null) index:0x0 Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:flags: 0x1bffff000000000() Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:page:c00a000000900300 count:17 mapcount:0 mapping: (null) index:0x0 Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:flags: 0x1bffff000000000() Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:page:c00a000000900300 count:17 mapcount:0 mapping: (null) index:0x0 Message from syslogd@localhost at Nov 2 03:31:39 ... kernel:flags: 0x1bffff000000000() ```
cdeadmin commented 6 years ago

------- Comment on attachment From pmac@au1.ibm.com 2017-11-07 00:52:02 EDT-------

Hot-unplugging memory from a guest will trigger HPT resizing for the guest if the guest has a sufficiently recent kernel (e.g. RHEL or CentOS 7.4). There is a bug in the host handling of HPT resizing which can lead to host crashes and possibly memory corruption. This patch should fix that bug, and so might help with this BZ.