open-power-host-os / linux

Linux kernel source tree
Other
3 stars 4 forks source link

Host Cpu Hardlockup and unstuck observed sometimes during guest boot. #32

Closed sathnaga closed 6 years ago

sathnaga commented 6 years ago
Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=169389 Env: HostOS CI P9: Host: HW: P9-Boston Kernel: 4.17.0-1.dev.git5ce3eac.el7.ppc64le Qemu: qemu-system-ppc-2.12.0-2.dev.gitd36f3ee.el7.ppc64le Libvirt: libvirt-4.3.0-1.dev.git3096ff1.el7.ppc64le Guest: HostOS (4.17.0-1.dev.git5ce3eac.el7.ppc64le) Test: Guest Boot through libvirt. Only First guest boot test got this issue, not further boot/any tests.. Test log link: https://ltc-jenkins.aus.stglabs.ibm.com/job/HostOS_CI_P9/10/artifact/avocado-fvt-wrapper/results/job-2018-07-01T15.50-ca32c56/test-results/001-guest_sanity.cpu.import.qemu.qcow2.virtio_scsi.smp2.virtio_net.Guest.HostOS.ppc64le.powerkvm-qemu.unattended_install.import.import.default_install.aio_native Testcase failure output: ``` 15:51:30 DEBUG| make_create_command() setting up command for nic: {'netdst': 'virbr0', 'ip': None, 'nic_name': 'nic1', 'mac': '52:54:00:6d:6e:6f', 'nettype': 'bridge', 'nic_model': 'virtio', 'g_nic_name': None} 15:51:30 DEBUG| vm.make_create_command.add_nic returning: --network=bridge=virbr0,model=virtio,mac=52:54:00:6d:6e:6f 15:51:30 INFO | Running libvirt command (reformatted): 15:51:30 INFO | /usr/bin/virt-install 15:51:30 INFO | --connect=qemu:///system 15:51:30 INFO | --hvm 15:51:30 INFO | --accelerate 15:51:30 INFO | --name 'virt-tests-vm1' 15:51:30 INFO | --machine pseries 15:51:30 INFO | --memory=32768 15:51:30 INFO | --vcpu=32,sockets=1,cores=32,threads=1 15:51:30 INFO | --import 15:51:30 INFO | --nographics 15:51:30 INFO | --serial pty 15:51:30 INFO | --memballoon model=virtio 15:51:30 INFO | --controller type=scsi,model=virtio-scsi 15:51:30 INFO | --disk path=/home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2,bus=scsi,size=10,format=qcow2 15:51:30 INFO | --network=bridge=virbr0,model=virtio,mac=52:54:00:6d:6e:6f 15:51:30 INFO | --noautoconsole 15:51:30 INFO | Running '/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name 'virt-tests-vm1' --machine pseries --memory=32768 --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics --serial pty --memballoon model=virtio --controller type=scsi,model=virtio-scsi --disk path=/home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2,bus=scsi,size=10,format=qcow2 --network=bridge=virbr0,model=virtio,mac=52:54:00:6d:6e:6f --noautoconsole' 15:51:31 DEBUG| [stderr] WARNING No operating system detected, VM performance may suffer. Specify an OS with --os-variant for optimal results. 15:51:33 DEBUG| [stdout] 15:51:33 DEBUG| [stdout] Starting install... 15:51:33 DEBUG| [stdout] Domain creation completed. 15:51:34 INFO | Command '/usr/bin/virt-install --connect=qemu:///system --hvm --accelerate --name 'virt-tests-vm1' --machine pseries --memory=32768 --vcpu=32,sockets=1,cores=32,threads=1 --import --nographics --serial pty --memballoon model=virtio --controller type=scsi,model=virtio-scsi --disk path=/home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2,bus=scsi,size=10,format=qcow2 --network=bridge=virbr0,model=virtio,mac=52:54:00:6d:6e:6f --noautoconsole' finished with 0 after 3.39120984077s 15:51:34 DEBUG| waiting for domain virt-tests-vm1 to start (0.000012 secs) 15:51:34 INFO | Waiting for installation to finish. Timeout set to 180 s (3 min) 15:51:34 DEBUG| Monitoring serial console log for completion message: /home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/results/job-2018-07-01T15.50-ca32c56/test-results/001-guest_sanity.cpu.import.qemu.qcow2.virtio_scsi.smp2.virtio_net.Guest.HostOS.ppc64le.powerkvm-qemu.unattended_install.import.import.default_install.aio_native/serial-serial0-virt-tests-vm1-4en2.log 15:51:34 DEBUG| Attempting to log into 'virt-tests-vm1' via serial console (timeout 10s) 15:51:55 WARNI| Error occur when update VM address cache: Login timeout expired (output: 'exceeded 10 s timeout') 15:51:56 DEBUG| Attempting to log into 'virt-tests-vm1' via serial console (timeout 10s) 15:52:15 DEBUG| Updated HWADDR (52:54:00:6d:6e:6f)<->(192.168.122.186) IP pair into address cache 15:52:17 WARNI| Error occur when update VM address cache: Login timeout expired (output: 'exceeded 10 s timeout') 15:52:19 DEBUG| cleaning up threads and mounts that may be active 15:52:19 INFO | Guest reported successful installation after 44 s (0 min) 15:52:19 INFO | Wait for guest to shutdown cleanly 15:52:20 DEBUG| Waiting for guest to shutdown 59 15:52:21 DEBUG| Waiting for guest to shutdown 58 15:52:22 DEBUG| Waiting for guest to shutdown 57 15:52:23 DEBUG| Waiting for guest to shutdown 56 15:52:24 DEBUG| Waiting for guest to shutdown 55 15:52:25 DEBUG| Waiting for guest to shutdown 54 15:52:26 DEBUG| Waiting for guest to shutdown 53 15:52:27 DEBUG| Waiting for guest to shutdown 52 15:52:28 DEBUG| Waiting for guest to shutdown 51 15:52:29 DEBUG| Waiting for guest to shutdown 50 15:52:29 DEBUG| Shutdown took 10 seconds 15:52:29 DEBUG| VM virt-tests-vm1 shut down 15:52:30 INFO | Guest managed to shutdown cleanly 15:52:30 WARNI| Requested MAC address release from persistent vm virt-tests-vm1. Ignoring. 15:52:30 DEBUG| Checking image file /home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2 15:52:30 DEBUG| Run qemu-img info comamnd on /home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2 15:52:30 INFO | Running '/usr/bin/qemu-img info -U /home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2' 15:52:31 DEBUG| [stdout] image: /home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2 15:52:31 DEBUG| [stdout] file format: qcow2 15:52:31 INFO | Command '/usr/bin/qemu-img info -U /home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/data/avocado-vt/images/hostos-ppc64le.qcow2' finished with 0 after 0.0559370517731s 15:52:31 DEBUG| [stdout] virtual size: 30G (32212254720 bytes) 15:52:31 DEBUG| [stdout] disk size: 30G 15:52:31 DEBUG| [stdout] cluster_size: 65536 15:52:31 DEBUG| [stdout] Format specific information: 15:52:31 DEBUG| [stdout] compat: 1.1 15:52:31 DEBUG| [stdout] lazy refcounts: true 15:52:31 DEBUG| [stdout] refcount bits: 16 15:52:31 DEBUG| [stdout] corrupt: false 15:52:31 INFO | Running 'true' 15:52:31 INFO | Command 'true' finished with 0 after 0.00186491012573s 15:52:31 INFO | Running 'ps -o comm 1' 15:52:31 DEBUG| [stdout] COMMAND 15:52:31 INFO | Command 'ps -o comm 1' finished with 0 after 0.0576908588409s 15:52:31 DEBUG| [stdout] systemd 15:52:31 INFO | Running 'true' 15:52:31 INFO | Command 'true' finished with 0 after 0.00177407264709s 15:52:31 INFO | Running 'ps -o comm 1' 15:52:31 DEBUG| [stdout] COMMAND 15:52:31 INFO | Command 'ps -o comm 1' finished with 0 after 0.0565540790558s 15:52:31 DEBUG| [stdout] systemd 15:52:31 DEBUG| Setting ignore_status to True. 15:52:31 INFO | Running 'systemctl reset-failed libvirtd.service' 15:52:31 INFO | Command 'systemctl reset-failed libvirtd.service' finished with 0 after 0.00678014755249s 15:52:31 DEBUG| Setting ignore_status to True. 15:52:31 INFO | Running 'systemctl restart libvirtd.service' 15:52:32 INFO | Command 'systemctl restart libvirtd.service' finished with 0 after 0.0816829204559s 15:52:32 INFO | Running 'virsh list' 15:52:33 DEBUG| [stdout] Id Name State 15:52:33 INFO | Command 'virsh list' finished with 0 after 0.953235149384s 15:52:33 DEBUG| [stdout] ---------------------------------------------------- 15:52:33 DEBUG| [stdout] 15:52:33 INFO | Running 'dmesg -C' 15:52:33 INFO | Command 'dmesg -C' finished with 0 after 0.00171399116516s 15:52:33 ERROR| 15:52:33 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado_plugins_vt-62.0-py2.7.egg/avocado_vt/test.py:454 15:52:33 ERROR| Traceback (most recent call last): 15:52:33 ERROR| File "/usr/lib/python2.7/site-packages/avocado_plugins_vt-62.0-py2.7.egg/virttest/error_context.py", line 135, in new_fn 15:52:33 ERROR| return fn(*args, **kwargs) 15:52:33 ERROR| File "/usr/lib/python2.7/site-packages/avocado_plugins_vt-62.0-py2.7.egg/virttest/env_process.py", line 1406, in postprocess 15:52:33 ERROR| raise RuntimeError("Failures occurred while postprocess:\n%s" % err) 15:52:33 ERROR| RuntimeError: Failures occurred while postprocess: 15:52:33 ERROR| 15:52:33 ERROR| Host dmesg verification failed: Found failures in host dmesg log Please check host dmesg log /home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/results/job-2018-07-01T15.50-ca32c56/test-results/001-guest_sanity.cpu.import.qemu.qcow2.virtio_scsi.smp2.virtio_net.Guest.HostOS.ppc64le.powerkvm-qemu.unattended_install.import.import.default_install.aio_native/host_dmesg.log. 15:52:33 ERROR| 15:52:33 INFO | cleaning libvirtd logs... 15:52:33 ERROR| 15:52:33 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado_framework-62.0-py2.7.egg/avocado/core/test.py:832 15:52:33 ERROR| Traceback (most recent call last): 15:52:33 ERROR| File "/usr/lib/python2.7/site-packages/avocado_plugins_vt-62.0-py2.7.egg/avocado_vt/test.py", line 297, in runTest 15:52:33 ERROR| raise self.__status # pylint: disable=E0702 15:52:33 ERROR| RuntimeError: Failures occurred while postprocess: 15:52:33 ERROR| 15:52:33 ERROR| Host dmesg verification failed: Found failures in host dmesg log Please check host dmesg log /home/workspace/runAvocadoFVTTest/avocado-fvt-wrapper/results/job-2018-07-01T15.50-ca32c56/test-results/001-guest_sanity.cpu.import.qemu.qcow2.virtio_scsi.smp2.virtio_net.Guest.HostOS.ppc64le.powerkvm-qemu.unattended_install.import.import.default_install.aio_native/host_dmesg.log. 15:52:33 ERROR| ``` ``` Host dmesg log: [Sat Jun 30 02:08:16 2018] watchdog: CPU 144 detected hard LOCKUP on other CPUs 102,135-136 [Sat Jun 30 02:08:16 2018] watchdog: CPU 102 Hard LOCKUP [Sat Jun 30 02:08:16 2018] watchdog: CPU 135 Hard LOCKUP [Sat Jun 30 02:08:16 2018] watchdog: CPU 136 Hard LOCKUP [Sat Jun 30 02:08:16 2018] watchdog: CPU 102 became unstuck [Sat Jun 30 02:08:16 2018] watchdog: CPU 135 became unstuck [Sat Jun 30 03:19:52 2018] watchdog: CPU 135 detected hard LOCKUP on other CPUs 69 [Sat Jun 30 03:19:52 2018] watchdog: CPU 69 Hard LOCKUP [Sat Jun 30 03:19:52 2018] watchdog: CPU 69 became unstuck ```
cdeadmin commented 6 years ago

------- Comment From viparash@in.ibm.com 2018-07-04 05:07:23 EDT------- Hi Satheesh,

Please provide Kernel logs post observing hardlocks.

sathnaga commented 6 years ago
Jun 30 02:08:04 ltc-boston114 kernel: watchdog: CPU 136 Hard LOCKUP
Jun 30 02:08:04 ltc-boston114 kernel: Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iscsi_target_mod target_core_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl lockd grace vhost_net vhost tap binfmt_misc xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables i2c_dev sunrpc ses enclosure at24 regmap_i2c ipmi_powernv ipmi_devintf ofpart powernv_flash ipmi_msghandler
Jun 30 02:08:04 ltc-boston114 kernel: opal_prd i2c_opal mtd kvm_hv kvm joydev ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm nvme nvme_core mpt3sas i40e drm_panel_orientation_quirks i2c_core aacraid raid_class scsi_transport_sas
Jun 30 02:08:04 ltc-boston114 kernel: CPU: 136 PID: 0 Comm: swapper/136 Not tainted 4.17.0-1.dev.git5ce3eac.el7.ppc64le #1
Jun 30 02:08:04 ltc-boston114 kernel: NIP:  c00000000009f2d4 LR: c00000000009f2d4 CTR: c000000000008000
Jun 30 02:08:04 ltc-boston114 kernel: REGS: c00020397c463c00 TRAP: 0100   Not tainted  (4.17.0-1.dev.git5ce3eac.el7.ppc64le)
Jun 30 02:08:04 ltc-boston114 kernel: MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 22004222  XER: 20040000
Jun 30 02:08:04 ltc-boston114 kernel: CFAR: c00020397c463d50 SOFTE: 410809990000000 #012GPR00: c00000000009f2d4 c00020397c463d60 c000000001475300 c00020397c463c00 #012GPR04: b000000000001033 c00000000009ecfc 0000000022004224 0000000000000002 #012GPR08: 0000000000000000 00000000000000ff 0000000000000010 0000000000000001 #012GPR12: 9000000000121033 c000203fff687800 c00020397c463f90 0000000000000000 #012GPR16: 0000000000000000 c0000000000478e0 c0000000000478e0 c000000000f95380 #012GPR20: 0000000000000006 c00000000137ba08 c00020397c460000 c00020397c460080 #012GPR24: 0000000000000008 0000000000000000 000175a30e23d444 c00000000137ba08 #012GPR28: c00000000137bc60 c0000000014b2348 0000000000000006 0000000000000006
Jun 30 02:08:04 ltc-boston114 kernel: NIP [c00000000009f2d4] power9_idle_type+0x24/0x40
Jun 30 02:08:04 ltc-boston114 kernel: LR [c00000000009f2d4] power9_idle_type+0x24/0x40
Jun 30 02:08:04 ltc-boston114 kernel: Call Trace:
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463d60] [c00000000009f2d4] power9_idle_type+0x24/0x40 (unreliable)
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463d80] [c000000000903ee0] stop_loop+0x40/0x5c
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463db0] [c0000000009006c0] cpuidle_enter_state+0xc0/0x3c0
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463e10] [c00000000014a46c] call_cpuidle+0x4c/0x80
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463e30] [c00000000014aa38] do_idle+0x308/0x3c0
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463ec0] [c00000000014acd8] cpu_startup_entry+0x38/0x40
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463ef0] [c000000000049c40] start_secondary+0x4e0/0x530
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463f90] [c00000000000b270] start_secondary_prolog+0x10/0x14
Jun 30 02:08:04 ltc-boston114 kernel: Instruction dump:
Jun 30 02:08:04 ltc-boston114 kernel: 7c0803a6 4e800020 60420000 3c4c013d 38426050 7c0802a6 60000000 7c0802a6
Jun 30 02:08:04 ltc-boston114 kernel: f8010010 f821ffe1 4bfff9bd 4bf776c9 <60000000> 38210020 e8010010 7c0803a6
Jun 30 02:08:04 ltc-boston114 kernel: watchdog: CPU 102 became unstuck
Jun 30 02:08:04 ltc-boston114 kernel: watchdog: CPU 135 became unstuck
Jun 30 03:01:01 ltc-boston114 systemd: Started Session 152 of user root.
Jun 30 03:01:01 ltc-boston114 systemd: Starting Session 152 of user root.
Jun 30 03:19:40 ltc-boston114 kernel: watchdog: CPU 135 detected hard LOCKUP on other CPUs 69
Jun 30 03:19:40 ltc-boston114 kernel: watchdog: CPU 69 Hard LOCKUP
Jun 30 03:19:40 ltc-boston114 kernel: Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iscsi_target_mod target_core_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl lockd grace vhost_net vhost tap binfmt_misc xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables i2c_dev sunrpc ses enclosure at24 regmap_i2c ipmi_powernv ipmi_devintf ofpart powernv_flash ipmi_msghandler
Jun 30 03:19:40 ltc-boston114 kernel: opal_prd i2c_opal mtd kvm_hv kvm joydev ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm nvme nvme_core mpt3sas i40e drm_panel_orientation_quirks i2c_core aacraid raid_class scsi_transport_sas
Jun 30 03:19:40 ltc-boston114 kernel: CPU: 69 PID: 0 Comm: swapper/69 Not tainted 4.17.0-1.dev.git5ce3eac.el7.ppc64le #1
Jun 30 03:19:40 ltc-boston114 kernel: NIP:  c00000000009f2d4 LR: c00000000009f2d4 CTR: c000000000008000
Jun 30 03:19:40 ltc-boston114 kernel: REGS: c000003fe5843c00 TRAP: 0100   Not tainted  (4.17.0-1.dev.git5ce3eac.el7.ppc64le)
Jun 30 03:19:40 ltc-boston114 kernel: MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 22004222  XER: 20040000
Jun 30 03:19:40 ltc-boston114 kernel: CFAR: c000003fe5843d50 SOFTE: 415105030000000 #012GPR00: c00000000009f2d4 c000003fe5843d60 c000000001475300 c000003fe5843c00 #012GPR04: b000000000001033 c00000000009ecfc 0000000022004224 0000000000000001 #012GPR08: 0000000000000000 00000000000000ff 0000000000000010 0000000000000001 #012GPR12: 9000000000121033 c000003ffffb1600 c000003fe5843f90 0000000000000000 #012GPR16: 0000000000000000 c0000000000478e0 c0000000000478e0 c000000000f95380 #012GPR20: 0000000000000006 c00000000137ba08 c000003fe5840000 c000003fe5840080 #012GPR24: 0000000000000008 0000000000000000 0001798b1595843c c00000000137ba08 #012GPR28: c00000000137bc60 c0000000014b2348 0000000000000006 0000000000000006
Jun 30 03:19:40 ltc-boston114 kernel: NIP [c00000000009f2d4] power9_idle_type+0x24/0x40
Jun 30 03:19:40 ltc-boston114 kernel: LR [c00000000009f2d4] power9_idle_type+0x24/0x40
Jun 30 03:19:40 ltc-boston114 kernel: Call Trace:
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843d60] [c00000000009f2d4] power9_idle_type+0x24/0x40 (unreliable)
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843d80] [c000000000903ee0] stop_loop+0x40/0x5c
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843db0] [c0000000009006c0] cpuidle_enter_state+0xc0/0x3c0
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843e10] [c00000000014a46c] call_cpuidle+0x4c/0x80
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843e30] [c00000000014aa38] do_idle+0x308/0x3c0
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843ec0] [c00000000014acd8] cpu_startup_entry+0x38/0x40
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843ef0] [c000000000049c40] start_secondary+0x4e0/0x530
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843f90] [c00000000000b270] start_secondary_prolog+0x10/0x14
Jun 30 03:19:40 ltc-boston114 kernel: Instruction dump:
Jun 30 03:19:40 ltc-boston114 kernel: 7c0803a6 4e800020 60420000 3c4c013d 38426050 7c0802a6 60000000 7c0802a6
Jun 30 03:19:40 ltc-boston114 kernel: f8010010 f821ffe1 4bfff9bd 4bf776c9 <60000000> 38210020 e8010010 7c0803a6
Jun 30 03:19:40 ltc-boston114 kernel: watchdog: CPU 69 became unstuck
Jun 30 04:01:01 ltc-boston114 systemd: Started Session 153 of user root.
cdeadmin commented 6 years ago

------- Comment (attachment only) From satheera@in.ibm.com 2018-07-05 01:10:42 EDT-------

cdeadmin commented 6 years ago

------- Comment From seg@us.ibm.com 2018-07-06 09:58:07 EDT------- Maybe this is a power management issue? Do we have the latest firmware applied? Does turning off stop states help?

cdeadmin commented 6 years ago

------- Comment From viparash@in.ibm.com 2018-07-09 13:32:38 EDT------- (In reply to comment #6) > Maybe this is a power management issue? Do we have the latest firmware > applied? Does turning off stop states help?

Yes, this seems to be power management issue like one reported in 166332. Please use latest firmware and let know if issue is still seen.