Closed sathnaga closed 6 years ago
------- Comment From viparash@in.ibm.com 2018-07-04 05:07:23 EDT------- Hi Satheesh,
Please provide Kernel logs post observing hardlocks.
Jun 30 02:08:04 ltc-boston114 kernel: watchdog: CPU 136 Hard LOCKUP
Jun 30 02:08:04 ltc-boston114 kernel: Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iscsi_target_mod target_core_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl lockd grace vhost_net vhost tap binfmt_misc xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables i2c_dev sunrpc ses enclosure at24 regmap_i2c ipmi_powernv ipmi_devintf ofpart powernv_flash ipmi_msghandler
Jun 30 02:08:04 ltc-boston114 kernel: opal_prd i2c_opal mtd kvm_hv kvm joydev ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm nvme nvme_core mpt3sas i40e drm_panel_orientation_quirks i2c_core aacraid raid_class scsi_transport_sas
Jun 30 02:08:04 ltc-boston114 kernel: CPU: 136 PID: 0 Comm: swapper/136 Not tainted 4.17.0-1.dev.git5ce3eac.el7.ppc64le #1
Jun 30 02:08:04 ltc-boston114 kernel: NIP: c00000000009f2d4 LR: c00000000009f2d4 CTR: c000000000008000
Jun 30 02:08:04 ltc-boston114 kernel: REGS: c00020397c463c00 TRAP: 0100 Not tainted (4.17.0-1.dev.git5ce3eac.el7.ppc64le)
Jun 30 02:08:04 ltc-boston114 kernel: MSR: 9000000000001033 <SF,HV,ME,IR,DR,RI,LE> CR: 22004222 XER: 20040000
Jun 30 02:08:04 ltc-boston114 kernel: CFAR: c00020397c463d50 SOFTE: 410809990000000 #012GPR00: c00000000009f2d4 c00020397c463d60 c000000001475300 c00020397c463c00 #012GPR04: b000000000001033 c00000000009ecfc 0000000022004224 0000000000000002 #012GPR08: 0000000000000000 00000000000000ff 0000000000000010 0000000000000001 #012GPR12: 9000000000121033 c000203fff687800 c00020397c463f90 0000000000000000 #012GPR16: 0000000000000000 c0000000000478e0 c0000000000478e0 c000000000f95380 #012GPR20: 0000000000000006 c00000000137ba08 c00020397c460000 c00020397c460080 #012GPR24: 0000000000000008 0000000000000000 000175a30e23d444 c00000000137ba08 #012GPR28: c00000000137bc60 c0000000014b2348 0000000000000006 0000000000000006
Jun 30 02:08:04 ltc-boston114 kernel: NIP [c00000000009f2d4] power9_idle_type+0x24/0x40
Jun 30 02:08:04 ltc-boston114 kernel: LR [c00000000009f2d4] power9_idle_type+0x24/0x40
Jun 30 02:08:04 ltc-boston114 kernel: Call Trace:
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463d60] [c00000000009f2d4] power9_idle_type+0x24/0x40 (unreliable)
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463d80] [c000000000903ee0] stop_loop+0x40/0x5c
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463db0] [c0000000009006c0] cpuidle_enter_state+0xc0/0x3c0
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463e10] [c00000000014a46c] call_cpuidle+0x4c/0x80
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463e30] [c00000000014aa38] do_idle+0x308/0x3c0
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463ec0] [c00000000014acd8] cpu_startup_entry+0x38/0x40
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463ef0] [c000000000049c40] start_secondary+0x4e0/0x530
Jun 30 02:08:04 ltc-boston114 kernel: [c00020397c463f90] [c00000000000b270] start_secondary_prolog+0x10/0x14
Jun 30 02:08:04 ltc-boston114 kernel: Instruction dump:
Jun 30 02:08:04 ltc-boston114 kernel: 7c0803a6 4e800020 60420000 3c4c013d 38426050 7c0802a6 60000000 7c0802a6
Jun 30 02:08:04 ltc-boston114 kernel: f8010010 f821ffe1 4bfff9bd 4bf776c9 <60000000> 38210020 e8010010 7c0803a6
Jun 30 02:08:04 ltc-boston114 kernel: watchdog: CPU 102 became unstuck
Jun 30 02:08:04 ltc-boston114 kernel: watchdog: CPU 135 became unstuck
Jun 30 03:01:01 ltc-boston114 systemd: Started Session 152 of user root.
Jun 30 03:01:01 ltc-boston114 systemd: Starting Session 152 of user root.
Jun 30 03:19:40 ltc-boston114 kernel: watchdog: CPU 135 detected hard LOCKUP on other CPUs 69
Jun 30 03:19:40 ltc-boston114 kernel: watchdog: CPU 69 Hard LOCKUP
Jun 30 03:19:40 ltc-boston114 kernel: Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iscsi_target_mod target_core_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl lockd grace vhost_net vhost tap binfmt_misc xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables i2c_dev sunrpc ses enclosure at24 regmap_i2c ipmi_powernv ipmi_devintf ofpart powernv_flash ipmi_msghandler
Jun 30 03:19:40 ltc-boston114 kernel: opal_prd i2c_opal mtd kvm_hv kvm joydev ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm nvme nvme_core mpt3sas i40e drm_panel_orientation_quirks i2c_core aacraid raid_class scsi_transport_sas
Jun 30 03:19:40 ltc-boston114 kernel: CPU: 69 PID: 0 Comm: swapper/69 Not tainted 4.17.0-1.dev.git5ce3eac.el7.ppc64le #1
Jun 30 03:19:40 ltc-boston114 kernel: NIP: c00000000009f2d4 LR: c00000000009f2d4 CTR: c000000000008000
Jun 30 03:19:40 ltc-boston114 kernel: REGS: c000003fe5843c00 TRAP: 0100 Not tainted (4.17.0-1.dev.git5ce3eac.el7.ppc64le)
Jun 30 03:19:40 ltc-boston114 kernel: MSR: 9000000000001033 <SF,HV,ME,IR,DR,RI,LE> CR: 22004222 XER: 20040000
Jun 30 03:19:40 ltc-boston114 kernel: CFAR: c000003fe5843d50 SOFTE: 415105030000000 #012GPR00: c00000000009f2d4 c000003fe5843d60 c000000001475300 c000003fe5843c00 #012GPR04: b000000000001033 c00000000009ecfc 0000000022004224 0000000000000001 #012GPR08: 0000000000000000 00000000000000ff 0000000000000010 0000000000000001 #012GPR12: 9000000000121033 c000003ffffb1600 c000003fe5843f90 0000000000000000 #012GPR16: 0000000000000000 c0000000000478e0 c0000000000478e0 c000000000f95380 #012GPR20: 0000000000000006 c00000000137ba08 c000003fe5840000 c000003fe5840080 #012GPR24: 0000000000000008 0000000000000000 0001798b1595843c c00000000137ba08 #012GPR28: c00000000137bc60 c0000000014b2348 0000000000000006 0000000000000006
Jun 30 03:19:40 ltc-boston114 kernel: NIP [c00000000009f2d4] power9_idle_type+0x24/0x40
Jun 30 03:19:40 ltc-boston114 kernel: LR [c00000000009f2d4] power9_idle_type+0x24/0x40
Jun 30 03:19:40 ltc-boston114 kernel: Call Trace:
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843d60] [c00000000009f2d4] power9_idle_type+0x24/0x40 (unreliable)
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843d80] [c000000000903ee0] stop_loop+0x40/0x5c
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843db0] [c0000000009006c0] cpuidle_enter_state+0xc0/0x3c0
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843e10] [c00000000014a46c] call_cpuidle+0x4c/0x80
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843e30] [c00000000014aa38] do_idle+0x308/0x3c0
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843ec0] [c00000000014acd8] cpu_startup_entry+0x38/0x40
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843ef0] [c000000000049c40] start_secondary+0x4e0/0x530
Jun 30 03:19:40 ltc-boston114 kernel: [c000003fe5843f90] [c00000000000b270] start_secondary_prolog+0x10/0x14
Jun 30 03:19:40 ltc-boston114 kernel: Instruction dump:
Jun 30 03:19:40 ltc-boston114 kernel: 7c0803a6 4e800020 60420000 3c4c013d 38426050 7c0802a6 60000000 7c0802a6
Jun 30 03:19:40 ltc-boston114 kernel: f8010010 f821ffe1 4bfff9bd 4bf776c9 <60000000> 38210020 e8010010 7c0803a6
Jun 30 03:19:40 ltc-boston114 kernel: watchdog: CPU 69 became unstuck
Jun 30 04:01:01 ltc-boston114 systemd: Started Session 153 of user root.
------- Comment (attachment only) From satheera@in.ibm.com 2018-07-05 01:10:42 EDT-------
------- Comment From seg@us.ibm.com 2018-07-06 09:58:07 EDT------- Maybe this is a power management issue? Do we have the latest firmware applied? Does turning off stop states help?
------- Comment From viparash@in.ibm.com 2018-07-09 13:32:38 EDT------- (In reply to comment #6) > Maybe this is a power management issue? Do we have the latest firmware > applied? Does turning off stop states help?
Yes, this seems to be power management issue like one reported in 166332. Please use latest firmware and let know if issue is still seen.