open-power / hostboot

System initialization firmware for Power systems
Apache License 2.0
74 stars 97 forks source link

Small core CPU report checkstop error when wof enabled #229

Closed lili-lilili closed 10 months ago

lili-lilili commented 1 year ago

We are testing small core CPU with OPAL. After enableling wof, small core CPU will report checkstop error,but big core CPU will not. There are roughly two types of checkstop errors, all of the report some IOHS_DLP_FIR_SMP error.

lili-lilili commented 1 year ago

1. When hostboot istep is execute,“TOD_ERROR(0)[21] S_PATH_1_STEP_CHECK_ERROR” checkstop reported. { "Private Header": { "Section Version": "1", "Sub-section type": "0", "Created by": "0xE500", "Created at": "12/21/2022 20:07:14", "Committed at": "12/21/2022 20:07:14", "Creator Subsystem": "BMC", "CSSVER": "", "Platform Log Id": "0x5000053B", "Entry Id": "0x5000053B", "BMC Event Log Id": "5559" }, "User Header": { "Section Version": "1", "Sub-section type": "0", "Log Committed by": "0x2000", "Subsystem": "Miscellaneous", "Event Scope": "Entire Platform", "Event Severity": "Unrecoverable Error", "Event Type": "Not Applicable", "Action Flags": [ "Service Action Required", "Report Externally", "HMC Call Home" ], "Host Transmission": "Not Sent", "HMC Transmission": "Not Sent" }, "Primary SRC": { "Section Version": "1", "Sub-section type": "1", "Created by": "0xE500", "SRC Version": "0x02", "SRC Format": "0x55", "Virtual Progress SRC": "False", "I5/OS Service Event Bit": "False", "Hypervisor Dump Initiated":"False", "Backplane CCIN": "2E2D", "Terminate FW Error": "False", "Deconfigured": "False", "Guarded": "False", "Error Details": { "Message": "Error Signature: 0x20DA0020 0x00030001 0x7D9B0015" }, "Valid Word Count": "0x09", "Reference Code": "BD70E510", "Hex Word 2": "00080055", "Hex Word 3": "2E2D0010", "Hex Word 4": "CC009542", "Hex Word 5": "00000000", "Hex Word 6": "20DA0020", "Hex Word 7": "00030001", "Hex Word 8": "7D9B0015", "Hex Word 9": "00000000", "Callout Section": { "Callout Count": "1", "Callouts": [{ "FRU Type": "Maintenance Procedure Required", "Priority": "Mandatory, replace all with this type as a unit", "Procedure": "BMC0002" }] }, "SRC Details": { "Primary Attention": "system checkstop", "Signature Description": { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "TOD_ERROR(0)[21] S_PATH_1_STEP_CHECK_ERROR", "Attn Type": "checkstop" } } }, "Extended User Header": { "Section Version": "1", "Sub-section type": "0", "Created by": "0x2000", "Reporting Machine Type": "9105-42A", "Reporting Serial Number": "783C4C1", "FW Released Ver": "", "FW SubSys Version": "0.1.1", "Common Ref Time": "00/00/0000 00:00:00", "Symptom Id Len": "36", "Symptom Id": "BD70E510_20DA0020_00030001_7D9B0015" }, "Failing MTMS": { "Section Version": "1", "Sub-section type": "0", "Created by": "0x2000", "Machine Type Model": "9105-42A", "Serial Number": "783C4C1" }, "User Data 0": { "Section Version": "1", "Sub-section type": "1", "Created by": "0x2000", "BMCLoad": "0.51 0.38 0.54", "BMCState": "Ready", "BMCUptime": "0y 0d 0h 52m 10s", "BootState": "SecondaryProcInit", "ChassisState": "On", "FW Version ID": "0.1.1", "HostState": "Running", "Process Name": "/usr/bin/openpower-hw-diags", "System IM": "60001000" }, "User Data 1": { "Section Version": "1", "Sub-section type": "1", "Created by": "0x2000", "PEL_SUBSYSTEM": "0x70", "SRC6": "551157792", "SRC7": "196609", "SRC8": "2107310101", "_PID": "4013" }, "User Data 2": { "Section Version": "1", "Sub-section type": "1", "Created by": "0x2000", "Data": [ { "Priority": "H", "Procedure": "next_level_support" } ] }, "User Data 3": { "Section Version": "1", "Sub-section type": "4", "Created by": "0xE500", "Hostboot Scratch Registers": { "0x0000283c": "0xaa801502", "0x000000004602f489": "0x686f7374626f6f74" } }, "User Data 4": { "Section Version": "1", "Sub-section type": "5", "Created by": "0xE500", "Scratch Register Error Signature": { "Chip ID": "0x004b0006", "Signature ID": "0x5993000a" } }, "User Data 5": { "Section Version": "1", "Sub-section type": "3", "Created by": "0xE500", "Callout List FFDC": [ { "Callout Type": "Procedure Callout", "Priority": "high", "Procedure": "next_level_support" } ] }, "User Data 6": { "Section Version": "1", "Sub-section type": "1", "Created by": "0xE500", "Signature List": [ { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "PB_EXT_FIR(0)[7] pb_x7_fir_err", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "PB_EXT_FIR(0)[6] pb_x6_fir_err", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "PB_EXT_FIR(0)[2] pb_x2_fir_err", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "TOD_ERROR(0)[21] S_PATH_1_STEP_CHECK_ERROR", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "TOD_ERROR(0)[33] I_PATH_SYNC_CHECK_ERROR", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[17] link1 sl ecc ue", "Attn Type": "recoverable" } ] }, "User Data 7": { "Section Version": "1", "Sub-section type": "2", "Created by": "0xE500", "Register Dump": [ "node 0 proc 0 (P10 2.0) ****", " GFIR_CS (0x570F001C) 1000 0000 0000 0000", " CFIR_N1_CS (0x03040000) 8000 0000 4000 0000", " CFIR_N1_CS_MASK (0x03040040) 2000 0000 0000 0000", " PB_EXT_FIR (0x030113AE) 0100 0000 0000 0000", " PB_EXT_FIR_MASK (0x030113B1) D400 0000 0000 0000", " GFIR_RE (0x570F001B) 0000 0001 0000 0000", " CFIR_IOHS_RE (0x1F040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1F01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1F01100A) 880F 00C4 E200 F64B", " IOHS_DLP_CONTROL (0x1F01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1F01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1F01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1F011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1F011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1F011014) 0000 0000 0000 00BF", " IOHS_DLP_LINK1_INFO (0x1F011015) 0000 0000 0000 006B", " IOHS_DLP_REPLAY_THRESHOLD (0x1F011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1F011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1F011022) 5EDF 0351 BC94 0C01", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1F011023) 2612 E07F 7000 0001", " IOHS_DLP_LINK0_QUALITY (0x1F011026) 02F0 3000 0000 4400", " IOHS_DLP_LINK1_QUALITY (0x1F011027) 02F0 3000 0000 0E00", " IOHS_DLP_DLL_STATUS (0x1F011028) 7F05 4A7F 054A C000", " IOHS_DLP_MISC_ERROR_STATU (0x1F011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1F011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1F011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1F011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1F011007) FFFF FFFF FFFF FFFC", "node 0 ocmb 6 (Explorer 2.0) ***", " CHIPLET_OCMB_FIR_MASK (0x08040002) 6627 FFE0 0000 0000", "node 0 proc 1 (P10 2.0) ****", " GFIR_CS (0x570F001C) 1000 0000 0000 0000", " CFIR_N1_CS (0x03040000) 8000 0000 4000 0000", " CFIR_N1_CS_MASK (0x03040040) 2000 0000 0000 0000", " PB_EXT_FIR (0x030113AE) 0200 0000 0000 0000", " PB_EXT_FIR_MASK (0x030113B1) B400 0000 0000 0000", " GFIR_RE (0x570F001B) 0000 0002 0000 0000", " CFIR_IOHS_RE (0x1E040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1E01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1E01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1E01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1E01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1E01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1E011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1E011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1E011014) 0000 0000 0000 00EE", " IOHS_DLP_LINK1_INFO (0x1E011015) 0000 0000 0000 0027", " IOHS_DLP_REPLAY_THRESHOLD (0x1E011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1E011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1E011022) AFBB 4EEB 9000 0C01", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1E011023) 0000 0000 0380 0161", " IOHS_DLP_LINK0_QUALITY (0x1E011026) 02F0 2F00 0000 8000", " IOHS_DLP_LINK1_QUALITY (0x1E011027) 02F0 3000 0000 0000", " IOHS_DLP_DLL_STATUS (0x1E011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1E011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1E011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1E011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1E011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1E011007) FFFF FFFF FFFF FFFC", "node 0 proc 2 (P10 2.0) ****", " GFIR_CS (0x570F001C) 1000 0000 0000 0000", " CFIR_N1_CS (0x03040000) 8000 0000 4000 0000", " CFIR_N1_CS_MASK (0x03040040) 2000 0000 0000 0000", " PB_EXT_FIR (0x030113AE) 2000 0000 0000 0000", " PB_EXT_FIR_MASK (0x030113B1) D400 0000 0000 0000", " GFIR_RE (0x570F001B) 0000 0022 0000 0000", " CFIR_IOHS_RE (0x1A040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1A01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1A01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1A01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1A01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1A01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1A011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1A011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1A011014) 0203 0000 0000 00C4", " IOHS_DLP_LINK1_INFO (0x1A011015) 0204 0000 0000 0010", " IOHS_DLP_REPLAY_THRESHOLD (0x1A011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1A011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1A011022) 0000 0000 0860 0641", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1A011023) 6804 F8AE 9E80 0311", " IOHS_DLP_LINK0_QUALITY (0x1A011026) 0300 3000 0000 0000", " IOHS_DLP_LINK1_QUALITY (0x1A011027) 0300 3000 0000 0000", " IOHS_DLP_DLL_STATUS (0x1A011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1A011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1A011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1A011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1A011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1A011007) FFFF FFFF FFFF FFFC", " CFIR_IOHS_RE (0x1E040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1E01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1E01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1E01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1E01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1E01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1E011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1E011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1E011014) 0000 0000 0000 0052", " IOHS_DLP_LINK1_INFO (0x1E011015) 0000 0000 0000 0051", " IOHS_DLP_REPLAY_THRESHOLD (0x1E011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1E011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1E011022) 3C19 EE03 1780 9521", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1E011023) 24A5 8FE8 9000 0101", " IOHS_DLP_LINK0_QUALITY (0x1E011026) 02F0 3000 0000 5400", " IOHS_DLP_LINK1_QUALITY (0x1E011027) 0300 2F00 0000 3500", " IOHS_DLP_DLL_STATUS (0x1E011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1E011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1E011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1E011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1E011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1E011007) FFFF FFFF FFFF FFFC", "node 0 proc 3 (P10 2.0) ****", " GFIR_CS (0x570F001C) 4000 0000 0000 0000", " CFIR_TP_CS (0x01040000) 8800 0000 0000 0000", " TOD_M_PATH_CTRL (0x00040000) 0E00 0000 0000 0000", " TOD_PRI_PORT_0_CTRL (0x00040001) C000 0000 0000 0000", " TOD_PRI_PORT_1_CTRL (0x00040002) C000 0000 0000 0000", " TOD_SEC_PORT_0_CTRL (0x00040003) C000 0000 0000 0000", " TOD_SEC_PORT_1_CTRL (0x00040004) C000 0000 0000 0000", " TOD_S_PATH_CTRL (0x00040005) 08C3 C30C 0200 0000", " TOD_I_PATH_CTRL (0x00040006) 03F3 0000 4600 0000", " TOD_PSS_MSS_CTRL (0x00040007) 1300 0000 0000 0000", " TOD_PSS_MSS_STATUS (0x00040008) 03E0 1C4C 0000 0000", " TOD_M_PATH_STATUS (0x00040009) 0047 0047 0000 0000", " TOD_S_PATH_STATUS (0x0004000A) 0000 4646 0E0F 0000", " TOD_CHIP_CTRL (0x00040010) 603F 0000 0000 0000", " TOD_FSM (0x00040024) 2800 0000 0000 0000", " TOD_RX_TTYPE_CTRL (0x00040029) 0000 0100 0000 4023", " TOD_ERROR (0x00040030) 0000 0400 4000 0040", " TOD_ERROR_MASK (0x00040032) FFFC 1BFF B7FF FFFF", " TP_LOCAL_FIR (0x01040100) 0000 0040 0100 0000", " TP_LOCAL_FIR_MASK (0x01040103) 00EF F527 FD3C 7E0F", " TP_LOCAL_FIR_ACT1 (0x01040107) FFFF FFAF FFFF FFFF", " GFIR_RE (0x570F001B) 0000 004B 0000 0000", " CFIR_IOHS_RE (0x19040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1901100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1901100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1901100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1901100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1901100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x19011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x19011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x19011014) 0000 0000 0000 0002", " IOHS_DLP_LINK1_INFO (0x19011015) 0000 0000 0000 009C", " IOHS_DLP_REPLAY_THRESHOLD (0x19011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x19011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x19011022) ACD2 7C04 0001 5001", " IOHS_DLP_LINK1_SYN_CAPTUR (0x19011023) 7933 F04C 8005 4901", " IOHS_DLP_LINK0_QUALITY (0x19011026) 02F0 2F00 0000 8200", " IOHS_DLP_LINK1_QUALITY (0x19011027) 02F0 3000 0000 4100", " IOHS_DLP_DLL_STATUS (0x19011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x19011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x19011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x19011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x19011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x19011007) FFFF FFFF FFFF FFFC", " CFIR_IOHS_RE (0x1C040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1C01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1C01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1C01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1C01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1C01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1C011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1C011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1C011014) 2C82 0000 0000 0004", " IOHS_DLP_LINK1_INFO (0x1C011015) 2C82 0000 0000 00CB", " IOHS_DLP_REPLAY_THRESHOLD (0x1C011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1C011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1C011022) 0080 0000 0000 0001", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1C011023) AF75 D4BC 7D73 C731", " IOHS_DLP_LINK0_QUALITY (0x1C011026) 02F0 3000 0000 4700", " IOHS_DLP_LINK1_QUALITY (0x1C011027) 0300 3000 0000 8300", " IOHS_DLP_DLL_STATUS (0x1C011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1C011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1C011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1C011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1C011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1C011007) FFFF FFFF FFFF FFFC", " CFIR_IOHS_RE (0x1E040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1E01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1E01100A) 880F 0804 E200 F64B", " IOHS_DLP_CONTROL (0x1E01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1E01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1E01100F) 3878 053F 00D4 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1E011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1E011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1E011014) 271A 0000 0000 00AA", " IOHS_DLP_LINK1_INFO (0x1E011015) 271A 0000 0000 0003", " IOHS_DLP_REPLAY_THRESHOLD (0x1E011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1E011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1E011022) 0000 0000 0000 0381", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1E011023) 0000 0000 0980 0001", " IOHS_DLP_LINK0_QUALITY (0x1E011026) 0300 2F00 0000 B000", " IOHS_DLP_LINK1_QUALITY (0x1E011027) 02F0 2F00 0000 1A00", " IOHS_DLP_DLL_STATUS (0x1E011028) 7F05 4A7F 0549 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1E011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1E011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1E011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1E011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1E011007) FFFF FFFF FFFF FFFC", " CFIR_IOHS_RE (0x1F040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1F01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1F01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1F01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1F01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1F01100F) 3878 053F 00E4 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1F011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1F011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1F011014) 316E 0000 0000 00EB", " IOHS_DLP_LINK1_INFO (0x1F011015) 316E 0000 0000 000A", " IOHS_DLP_REPLAY_THRESHOLD (0x1F011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1F011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1F011022) 252E E4B2 2020 0471", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1F011023) 2DA3 2DC6 A004 4001", " IOHS_DLP_LINK0_QUALITY (0x1F011026) 02F0 2F00 0000 0000", " IOHS_DLP_LINK1_QUALITY (0x1F011027) 0300 3000 0000 0000", " IOHS_DLP_DLL_STATUS (0x1F011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1F011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1F011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1F011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1F011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1F011007) FFFF FFFF FFFF FFFC" ] } }

lili-lilili commented 1 year ago

2. When the machine just boot to OS or we have just started stream test, "MC_DSTL_FIR(4)[22] Subchannel A Channel timeout" checkstop error will be report.

{ "Private Header": { "Section Version": "1", "Sub-section type": "0", "Created by": "0xE500", "Created at": "12/21/2022 16:26:43", "Committed at": "12/21/2022 16:26:43", "Creator Subsystem": "BMC", "CSSVER": "", "Platform Log Id": "0x500003E9", "Entry Id": "0x500003E9", "BMC Event Log Id": "1414" }, "User Header": { "Section Version": "1", "Sub-section type": "0", "Log Committed by": "0x2000", "Subsystem": "Memory Card/FRU", "Event Scope": "Entire Platform", "Event Severity": "Unrecoverable Error", "Event Type": "Not Applicable", "Action Flags": [ "Service Action Required", "Report Externally", "HMC Call Home" ], "Host Transmission": "Not Sent", "HMC Transmission": "Not Sent" }, "Primary SRC": { "Section Version": "1", "Sub-section type": "1", "Created by": "0xE500", "SRC Version": "0x02", "SRC Format": "0x55", "Virtual Progress SRC": "False", "I5/OS Service Event Bit": "False", "Hypervisor Dump Initiated":"False", "Backplane CCIN": "2E2D", "Terminate FW Error": "False", "Deconfigured": "False", "Guarded": "True", "Error Details": { "Message": "Error Signature: 0x20DA0020 0x00030002 0xBCE50416" }, "Valid Word Count": "0x09", "Reference Code": "BD24E510", "Hex Word 2": "00080055", "Hex Word 3": "2E2D0010", "Hex Word 4": "CC009544", "Hex Word 5": "01000000", "Hex Word 6": "20DA0020", "Hex Word 7": "00030002", "Hex Word 8": "BCE50416", "Hex Word 9": "00000000", "Callout Section": { "Callout Count": "3", "Callouts": [{ "FRU Type": "Normal Hardware FRU", "Priority": "Mandatory, replace all with this type as a unit", "Location Code": "U78DA.ND0.WZS01SL-P0-C25", "Part Number": "", "CCIN": "", "Serial Number": "" }, { "FRU Type": "Normal Hardware FRU", "Priority": "Mandatory, replace all with this type as a unit", "Location Code": "U78DA.ND0.WZS01SL-P0-C24", "Part Number": "", "CCIN": "", "Serial Number": "" }, { "FRU Type": "Normal Hardware FRU", "Priority": "Lowest priority replacement", "Location Code": "U78DA.ND0.WZS01SL-P0", "Part Number": "03KP470", "CCIN": "2E2D", "Serial Number": "YF13UF2CM01T" }] }, "SRC Details": { "Primary Attention": "system checkstop", "Signature Description": { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "MC_DSTL_FIR(4)[22] Subchannel A Channel timeout", "Attn Type": "unit checkstop" } } }, "Extended User Header": { "Section Version": "1", "Sub-section type": "0", "Created by": "0x2000", "Reporting Machine Type": "9105-42A", "Reporting Serial Number": "783C4C1", "FW Released Ver": "", "FW SubSys Version": "0.1.1", "Common Ref Time": "00/00/0000 00:00:00", "Symptom Id Len": "36", "Symptom Id": "BD24E510_20DA0020_00030002_BCE50416" }, "Failing MTMS": { "Section Version": "1", "Sub-section type": "0", "Created by": "0x2000", "Machine Type Model": "9105-42A", "Serial Number": "783C4C1" }, "User Data 0": { "Section Version": "1", "Sub-section type": "1", "Created by": "0x2000", "BMCLoad": "2.01 1.63 1.36", "BMCState": "Ready", "BMCUptime": "0y 0d 0h 35m 39s", "BootState": "SecondaryProcInit", "ChassisState": "On", "FW Version ID": "0.1.1", "HostState": "Running", "Process Name": "/usr/bin/openpower-hw-diags", "System IM": "60001000" }, "User Data 1": { "Section Version": "1", "Sub-section type": "1", "Created by": "0x2000", "PEL_SUBSYSTEM": "0x24", "SRC6": "551157792", "SRC7": "196610", "SRC8": "3169125398", "_PID": "1391" }, "User Data 2": { "Section Version": "1", "Sub-section type": "1", "Created by": "0x2000", "Data": [ { "Deconfigured": false, "EntityPath": [ 35, 1, 0, 2, 0, 75, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "GuardType": "GARD_Unrecoverable", "Guarded": true, "LocationCode": "Ufcs-P0-C25", "Priority": "H" }, { "Deconfigured": false, "Guarded": false, "LocationCode": "Ufcs-P0-C24", "Priority": "L" }, { "Deconfigured": false, "Guarded": false, "LocationCode": "P0", "Priority": "L" } ] }, "User Data 3": { "Section Version": "1", "Sub-section type": "4", "Created by": "0xE500", "Hostboot Scratch Registers": { "0x0000283c": "0xaa811504", "0x000000004602f489": "0x0000000000000000" } }, "User Data 4": { "Section Version": "1", "Sub-section type": "5", "Created by": "0xE500", "Scratch Register Error Signature": { "Chip ID": "0x004b0006", "Signature ID": "0x5993000a" } }, "User Data 5": { "Section Version": "1", "Sub-section type": "3", "Created by": "0xE500", "Callout List FFDC": [ { "Bus Type": "OMI_BUS", "Callout Type": "Connected Callout", "Guard": true, "Priority": "high", "RX Target": "physical:sys-0/node-0/proc-3/mc-2/mi-0/mcc-0/omi-0", "TX Target": "physical:sys-0/node-0/ocmb_chip-24" }, { "Bus Type": "OMI_BUS", "Callout Type": "Bus Callout", "Guard": false, "Priority": "low", "RX Target": "physical:sys-0/node-0/proc-3/mc-2/mi-0/mcc-0/omi-0", "TX Target": "physical:sys-0/node-0/ocmb_chip-24" } ] }, "User Data 6": { "Section Version": "1", "Sub-section type": "1", "Created by": "0xE500", "Signature List": [ { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "PB_EXT_FIR(0)[2] pb_x2_fir_err", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "OCC_FIR(0)[10] GPE1 asserted an error condition that caused it to halt.", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "PB_STATION_FIR_ES3(0)[9] hang_recovery_gte_level1", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 0 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "EQ_CORE_FIR(0)[14] MCHK received while ME=0 - non recoverable", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "OCC_FIR(0)[10] GPE1 asserted an error condition that caused it to halt.", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "PB_STATION_FIR_ES3(0)[9] hang_recovery_gte_level1", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 1 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "PB_EXT_FIR(0)[4] pb_x4_fir_err", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "OCC_FIR(0)[10] GPE1 asserted an error condition that caused it to halt.", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "PB_STATION_FIR_ES3(0)[9] hang_recovery_gte_level1", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(2)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 2 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "PB_EXT_FIR(0)[7] pb_x7_fir_err", "Attn Type": "checkstop" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "MC_DSTL_FIR(4)[22] Subchannel A Channel timeout", "Attn Type": "unit checkstop" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "OCC_FIR(0)[10] GPE1 asserted an error condition that caused it to halt.", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "PB_STATION_FIR_ES3(0)[9] hang_recovery_gte_level1", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "MC_DSTL_FIR(4)[1] Subchannel A AFU initiated Recoverable Attention", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "MC_DSTL_FIR(4)[14] Subchannel A valid cmd timeout error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "MC_OMI_DL(8)[3] OMI-DL detected a CRC error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "MC_OMI_DL(8)[7] OMI-DL retrained due to no forward progress", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "MC_OMI_DL(8)[3] OMI-DL detected a CRC error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "MC_OMI_DL(8)[7] OMI-DL retrained due to no forward progress", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(1)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(4)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(6)[17] link1 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[6] link0 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[7] link1 crc error", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[14] link0 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[15] link1 sl ecc correctable", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[16] link0 sl ecc ue", "Attn Type": "recoverable" }, { "Chip Desc": "node 0 proc 3 (P10 2.0)", "Signature": "IOHS_DLP_FIR_SMP(7)[17] link1 sl ecc ue", "Attn Type": "recoverable" } ] }, "User Data 7": { "Section Version": "1", "Sub-section type": "2", "Created by": "0xE500", "Register Dump": [ "node 0 proc 0 (P10 2.0) ****", " GFIR_CS (0x570F001C) 1000 0000 0000 0000", " CFIR_N1_CS (0x03040000) 8000 0000 4000 0000", " CFIR_N1_CS_MASK (0x03040040) 2000 0000 0000 0000", " PB_EXT_FIR (0x030113AE) 2000 0000 0000 0000", " PB_EXT_FIR_MASK (0x030113B1) D400 0000 0000 0000", " GFIR_RE (0x570F001B) 5000 0001 0000 0000", " CFIR_TP_RE (0x01040001) 8400 0000 0000 0000", " OCC_FIR (0x01010800) 0820 F002 0000 0000", " OCC_FIR_MASK (0x01010803) F798 F0FF 07C0 61BC", " OCC_FIR_ACT0 (0x01010806) 0806 0000 1008 0000", " OCC_FIR_ACT1 (0x01010807) 0061 0F00 E837 9E40", " CFIR_N1_RE (0x03040001) 8000 0001 0000 0000", " PB_STATION_MODE_ES3 (0x0301138A) 1522 7D02 A191 A362", " PB_STATION_FIR_ES3 (0x03011380) 0040 0000 0000 0000", " PB_STATION_FIR_ES3_MASK (0x03011383) 0501 FC00 0000 0000", " PB_STATION_FIR_ES3_ACT1 (0x03011387) 0040 0000 0000 0000", " CFIR_IOHS_RE (0x1F040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1F01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1F01100A) 880F 00C4 E200 F64B", " IOHS_DLP_CONTROL (0x1F01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1F01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1F01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1F011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1F011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1F011014) 0000 0000 0000 0090", " IOHS_DLP_LINK1_INFO (0x1F011015) 0000 0000 0000 0003", " IOHS_DLP_REPLAY_THRESHOLD (0x1F011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1F011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1F011022) D01E FC85 C009 2C91", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1F011023) 0000 0000 0910 0001", " IOHS_DLP_LINK0_QUALITY (0x1F011026) 02F0 3000 0000 C800", " IOHS_DLP_LINK1_QUALITY (0x1F011027) 02F0 3000 0000 9200", " IOHS_DLP_DLL_STATUS (0x1F011028) 7F05 4A7F 054A C000", " IOHS_DLP_MISC_ERROR_STATU (0x1F011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1F011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1F011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1F011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1F011007) FFFF FFFF FFFF FFFC", "node 0 ocmb 6 (Explorer 2.0) ***", " CHIPLET_OCMB_FIR_MASK (0x08040002) 6627 FFE0 0000 0000", "node 0 proc 1 (P10 2.0) ****", " GFIR_CS (0x570F001C) 0000 0000 8000 0000", " CFIR_EQ_CS (0x20040000) 8400 0000 0000 0000", " CFIR_EQ_CS_MASK (0x20040040) 2198 1800 0000 0000", " HV_PR_STATE (0x2002840D) 0000 0000 0000 00F0", " PC_FIR_HOLD_OUT (0x20028451) 2100 0800 0000 0000", " TFAC_HOLD_OUT (0x200284B7) 4000 0000 0000 0000", " EQ_CORE_FIR (0x20028440) 0003 8000 0000 0000", " EQ_CORE_FIR_MASK (0x20028443) 0221 D81A 71A9 F6FA", " EQ_CORE_FIR_ACT1 (0x20028447) A914 2485 7410 0084", " EQ_CORE_FIR_WOF (0x20028448) 0002 0000 0000 0000", " GFIR_RE (0x570F001B) 5000 0002 0000 0000", " CFIR_TP_RE (0x01040001) 8400 0000 0000 0000", " OCC_FIR (0x01010800) 0828 F002 0000 0000", " OCC_FIR_MASK (0x01010803) F798 F0FF 07C0 61BC", " OCC_FIR_ACT0 (0x01010806) 0806 0000 1008 0000", " OCC_FIR_ACT1 (0x01010807) 0061 0F00 E837 9E40", " CFIR_N1_RE (0x03040001) 8000 0001 0000 0000", " PB_STATION_MODE_ES3 (0x0301138A) 0522 7D02 A201 A362", " PB_STATION_FIR_ES3 (0x03011380) 0040 0000 0000 0000", " PB_STATION_FIR_ES3_MASK (0x03011383) 0501 FC00 0000 0000", " PB_STATION_FIR_ES3_ACT1 (0x03011387) 0040 0000 0000 0000", " CFIR_IOHS_RE (0x1E040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1E01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1E01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1E01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1E01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1E01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1E011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1E011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1E011014) 0000 0000 0000 005C", " IOHS_DLP_LINK1_INFO (0x1E011015) 0000 0000 0000 0016", " IOHS_DLP_REPLAY_THRESHOLD (0x1E011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1E011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1E011022) 3276 8F47 D293 1951", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1E011023) 31A2 4F77 9089 1381", " IOHS_DLP_LINK0_QUALITY (0x1E011026) 0300 3000 0000 0000", " IOHS_DLP_LINK1_QUALITY (0x1E011027) 02F0 3000 0000 AE00", " IOHS_DLP_DLL_STATUS (0x1E011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1E011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1E011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1E011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1E011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1E011007) FFFF FFFF FFFF FFFC", "node 0 proc 2 (P10 2.0) ****", " GFIR_CS (0x570F001C) 1000 0000 0000 0000", " CFIR_N1_CS (0x03040000) 8000 0000 4000 0000", " CFIR_N1_CS_MASK (0x03040040) 2000 0000 0000 0000", " PB_EXT_FIR (0x030113AE) 0800 0000 0000 0000", " PB_EXT_FIR_MASK (0x030113B1) D400 0000 0000 0000", " GFIR_RE (0x570F001B) 5000 0022 0000 0000", " CFIR_TP_RE (0x01040001) 8400 0000 0000 0000", " OCC_FIR (0x01010800) 0828 F002 0000 0000", " OCC_FIR_MASK (0x01010803) F798 F0FF 07C0 61BC", " OCC_FIR_ACT0 (0x01010806) 0806 0000 1008 0000", " OCC_FIR_ACT1 (0x01010807) 0061 0F00 E837 9E40", " CFIR_N1_RE (0x03040001) 8000 0001 0000 0000", " PB_STATION_MODE_ES3 (0x0301138A) 0522 7D02 A201 A362", " PB_STATION_FIR_ES3 (0x03011380) 0040 0000 0000 0000", " PB_STATION_FIR_ES3_MASK (0x03011383) 0501 FC00 0000 0000", " PB_STATION_FIR_ES3_ACT1 (0x03011387) 0040 0000 0000 0000", " CFIR_IOHS_RE (0x1A040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1A01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1A01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1A01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1A01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1A01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1A011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1A011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1A011014) 0000 0000 0000 0034", " IOHS_DLP_LINK1_INFO (0x1A011015) 0000 0000 0000 004E", " IOHS_DLP_REPLAY_THRESHOLD (0x1A011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1A011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1A011022) 4002 0000 0A40 0081", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1A011023) 8AA1 152B 500A 4E81", " IOHS_DLP_LINK0_QUALITY (0x1A011026) 0300 3000 0000 B200", " IOHS_DLP_LINK1_QUALITY (0x1A011027) 02F0 3000 0000 2C00", " IOHS_DLP_DLL_STATUS (0x1A011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1A011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1A011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1A011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1A011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1A011007) FFFF FFFF FFFF FFFC", " CFIR_IOHS_RE (0x1E040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1E01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1E01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1E01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1E01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1E01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1E011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1E011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1E011014) 0000 0000 0000 00D8", " IOHS_DLP_LINK1_INFO (0x1E011015) 0000 0000 0000 00C6", " IOHS_DLP_REPLAY_THRESHOLD (0x1E011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1E011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1E011022) 455C D3B1 BE00 0E51", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1E011023) C4D0 ED4A A002 51E1", " IOHS_DLP_LINK0_QUALITY (0x1E011026) 0300 2F00 0000 F300", " IOHS_DLP_LINK1_QUALITY (0x1E011027) 02F0 3000 0000 1F00", " IOHS_DLP_DLL_STATUS (0x1E011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1E011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1E011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1E011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1E011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1E011007) FFFF FFFF FFFF FFFC", "node 0 proc 3 (P10 2.0) ****", " GFIR_CS (0x570F001C) 1000 0000 0000 0000", " CFIR_N1_CS (0x03040000) 8000 0000 4000 0000", " CFIR_N1_CS_MASK (0x03040040) 2000 0000 0000 0000", " PB_EXT_FIR (0x030113AE) 0100 0000 0000 0000", " PB_EXT_FIR_MASK (0x030113B1) B400 0000 0000 0000", " GFIR_UCS (0x570F002A) 0002 0000 0000 0000", " CFIR_MC_UCS (0x0E040003) 8400 0000 0000 0000", " MC_DSTL_ERR_RPT (0x0E010D0C) 0000 0000 0000 0002", " MC_DSTL_CFG2 (0x0E010D0E) 0000 0000 0600 0000", " MC_DSTL_FIR (0x0E010D00) 4002 0A00 0100 0000", " MC_DSTL_FIR_MASK (0x0E010D03) 1F01 3C00 6780 0000", " MC_DSTL_FIR_ACT0 (0x0E010D06) 880C C300 1800 0000", " MC_DSTL_FIR_ACT1 (0x0E010D07) CC2F C363 9800 0000", " MC_DSTL_FIR_ACT2 (0x0E010D09) 2200 0000 0000 0000", " GFIR_RE (0x570F001B) 5002 004B 0000 0000", " CFIR_TP_RE (0x01040001) 8400 0000 0000 0000", " OCC_FIR (0x01010800) 0828 F002 0000 0000", " OCC_FIR_MASK (0x01010803) F798 F0FF 07C0 61BC", " OCC_FIR_ACT0 (0x01010806) 0806 0000 1008 0000", " OCC_FIR_ACT1 (0x01010807) 0061 0F00 E837 9E40", " CFIR_N1_RE (0x03040001) 8000 0001 0000 0000", " PB_STATION_MODE_ES3 (0x0301138A) 0522 7D02 A1E1 A362", " PB_STATION_FIR_ES3 (0x03011380) 0040 0000 0000 0000", " PB_STATION_FIR_ES3_MASK (0x03011383) 0501 FC00 0000 0000", " PB_STATION_FIR_ES3_ACT1 (0x03011387) 0040 0000 0000 0000", " CFIR_MC_RE (0x0E040001) A404 0000 0000 0000", " CMN_CONFIG (0x0E01140E) 9215 6400 8874 630F", " PMU_CNTR (0x0E01140F) 0000 FFFF 0000 0000", " MC_OMI_DL_FIR (0x0E011400) 1990 0000 0000 0000", " MC_OMI_DL_FIR_MASK (0x0E011403) 089F FFFF FFFF FFFC", " MC_OMI_DL_FIR_ACT1 (0x0E011407) FFFF FFFF FFFF FFFC", " MC_OMI_DL_CONFIG0 (0x0E011410) 8120 0400 02F1 3824", " MC_OMI_DL_CONFIG1 (0x0E011411) 0500 0500 0000 006F", " MC_OMI_DL_ERR_MASK (0x0E011412) 0000 FF51 0004 0020", " MC_OMI_DL_ERR_RPT (0x0E011413) 0000 0080 F003 0000", " MC_OMI_DL_STATUS (0x0E011416) 220C A5FF 2081 5100", " MC_OMI_DL_TRAINING_STATUS (0x0E011417) 00FF FFFF FFFF 0000", " MC_OMI_DL_ERR_ACTION (0x0E01141D) 0000 0000 0000 0001", " MC_OMI_DL_DEBUG_AID (0x0E01141E) 0000 0000 0000 00FF", " MC_OMI_DL_CYA_BITS (0x0E01141F) 0000 0000 0000 0200", " CFIR_IOHS_RE (0x19040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1901100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1901100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1901100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1901100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1901100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x19011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x19011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x19011014) 1119 0000 0000 00F7", " IOHS_DLP_LINK1_INFO (0x19011015) 1119 0000 0000 0078", " IOHS_DLP_REPLAY_THRESHOLD (0x19011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x19011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x19011022) B1D4 047E 4000 0001", " IOHS_DLP_LINK1_SYN_CAPTUR (0x19011023) 1CE0 4089 5000 0901", " IOHS_DLP_LINK0_QUALITY (0x19011026) 0300 3000 0000 1E00", " IOHS_DLP_LINK1_QUALITY (0x19011027) 0300 3000 0000 BB00", " IOHS_DLP_DLL_STATUS (0x19011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x19011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x19011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x19011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x19011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x19011007) FFFF FFFF FFFF FFFC", " CFIR_IOHS_RE (0x1C040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1C01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1C01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1C01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1C01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1C01100F) 3878 053F 0004 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1C011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1C011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1C011014) 2F75 0000 0000 00AE", " IOHS_DLP_LINK1_INFO (0x1C011015) 2F75 0000 0000 0087", " IOHS_DLP_REPLAY_THRESHOLD (0x1C011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1C011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1C011022) 0C51 C4B9 1000 08C1", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1C011023) C3A6 AF3D 5160 0C91", " IOHS_DLP_LINK0_QUALITY (0x1C011026) 02F0 2F00 0000 0000", " IOHS_DLP_LINK1_QUALITY (0x1C011027) 0300 2F00 0000 0000", " IOHS_DLP_DLL_STATUS (0x1C011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1C011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1C011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1C011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1C011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1C011007) FFFF FFFF FFFF FFFC", " CFIR_IOHS_RE (0x1E040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1E01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1E01100A) 880F 0804 E200 F64B", " IOHS_DLP_CONTROL (0x1E01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1E01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1E01100F) 3878 053F 00D4 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1E011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1E011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1E011014) 04B2 0000 0000 005C", " IOHS_DLP_LINK1_INFO (0x1E011015) 04B2 0000 0000 0006", " IOHS_DLP_REPLAY_THRESHOLD (0x1E011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1E011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1E011022) 1B92 590F 9003 5001", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1E011023) 0000 0000 0002 5001", " IOHS_DLP_LINK0_QUALITY (0x1E011026) 02F0 3000 0000 0000", " IOHS_DLP_LINK1_QUALITY (0x1E011027) 02F0 3000 0000 3D00", " IOHS_DLP_DLL_STATUS (0x1E011028) 7F05 4A7F 0549 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1E011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1E011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1E011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1E011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1E011007) FFFF FFFF FFFF FFFC", " CFIR_IOHS_RE (0x1F040001) 8400 0000 0000 0000", " IOHS_DLP_PHY_CONFIG (0x1F01100C) AF00 0000 0000 00C1", " IOHS_DLP_CONFIG (0x1F01100A) 880F 0004 E200 F64B", " IOHS_DLP_CONTROL (0x1F01100B) C000 0000 C000 0000", " IOHS_DLP_SEC_CONFIG (0x1F01100D) 0000 0000 0000 001A", " IOHS_DLP_OPTICAL_CONFIG (0x1F01100F) 3878 053F 00E4 8C10", " IOHS_DLP_LINK0_RX_LANE_CO (0x1F011012) 0000 0000 0FF8 0000", " IOHS_DLP_LINK1_RX_LANE_CO (0x1F011013) 0000 0000 0FF8 0000", " IOHS_DLP_LINK0_INFO (0x1F011014) 1B1B 0000 0000 007D", " IOHS_DLP_LINK1_INFO (0x1F011015) 1B1B 0000 0000 005C", " IOHS_DLP_REPLAY_THRESHOLD (0x1F011018) 6FE0 0000 0000 0000", " IOHS_DLP_SL_ECC_THRESHOLD (0x1F011019) 7FC0 0000 0000 0000", " IOHS_DLP_LINK0_SYN_CAPTUR (0x1F011022) 8C39 67C2 F2A2 1901", " IOHS_DLP_LINK1_SYN_CAPTUR (0x1F011023) 78EE 9F6C A009 0001", " IOHS_DLP_LINK0_QUALITY (0x1F011026) 02F0 3000 0000 8900", " IOHS_DLP_LINK1_QUALITY (0x1F011027) 0300 3000 0000 0000", " IOHS_DLP_DLL_STATUS (0x1F011028) 7F05 4A7F 0548 C000", " IOHS_DLP_MISC_ERROR_STATU (0x1F011029) 0000 00CF 0000 0000", " IOHS_DLP_FIR (0x1F011000) F303 CF00 0000 0000", " IOHS_DLP_FIR_MASK (0x1F011003) FCFC 3FFF FCC0 C000", " IOHS_DLP_FIR_ACT0 (0x1F011006) FCFC 3FFF FCC0 0000", " IOHS_DLP_FIR_ACT1 (0x1F011007) FFFF FFFF FFFF FFFC" ] }, "User Data 8": { "Section Version": "1", "Sub-section type": "1", "Created by": "0x2000", "PEL Internal Debug Data": { "SRC": [ "No VPD found for /xyz/openbmc_project/inventory/system/chassis/motherboard/dimm24: sd_bus_call: xyz.openbmc_project.Common.Error.ResourceNotFound: The resource is not found.", "No VPD found for /xyz/openbmc_project/inventory/system/chassis/motherboard/dcm1/cpu1: sd_bus_call: xyz.openbmc_project.Common.Error.ResourceNotFound: The resource is not found." ] } } }

dcrowell77 commented 1 year ago

Please post the boot console as well, that provides more context on when in the boot the failure is happening. I'm not familiar with #1 (TOD error) so I have to do some research. Problem #2 (subchannel checkstop) seems unrelated to smallcore or WOF, you may just have a suspect DDIMM and/or DDIMM connector. I would suggest you try (carefully) moving some parts around to see if it follows the slot or the DDIMM.

It is odd that WOF enablement affects anything during Hostboot because we don't really have the PM complex logic enabled while we run.

dcrowell77 commented 1 year ago

All of those SMP errors are likely the root cause of your issues (for at least 1, maybe 2). I can tell from the output that the checkstop happens in istep 21.2 ("0x0000283c": "0xaa801502"). That is the point where we start the PM complex up. Our guess is that this is related to the voltage issues that have already been discussed elsewhere. We recommend taking a close look at all the voltages before, during, and after the checkstop to see if things look okay.

lili-lilili commented 1 year ago

Based on more test cases, let me update the issue.

  1. The checkstop with SMP errors may occur in istep 9.6/istep 11.3/istep 21.2 or OS. So it seems that this issue is not directly related to WOF. But if the machine can boot to OS (with wof enable) and we do some stream test, then the checkstop is likely to occur.

  2. I am tried to reduce the link frequency of IOHS to 25G(FREQ_IOHS_LINK_MHZ is modified from 32500M to 25781M) The checkstop with SMP error will still happen. But I'm not sure if this is the correct way to modify iohs link frequency.

  3. In addition, test data shows that if a CPU experiences an SMP error for the first time, the probability of the CPU repeating the problem in subsequent tests will greatly increase. I am not sure if this is due to CPU hardware damage or if the CPU will record certain error internally(On P9, we have encountered situations where xbus errors are recorded inside the CPU).

  4. Also, we encountered this issue when using a single CPU configuration.

  5. Now, i am checking IO_IOHS_CHANNEL_LOSS attribute of IOHS. Its setup is different on the different IOHS on the Rainier. If the SPM interconnected is across the CPU, the attribute is set to HIGH_LOSS, otherwise the attribute is set to LOW_LOSS. It seems that this attribute is related to SI, I'm wondering if I should try changing it.

dcrowell77 commented 1 year ago

2 - How/where are you setting that attribute value?

3 - There is no persistent data for xbus/abus lane repair in P10. It seems like the observation may just be a correlation, i.e. a part that fails once is likely to fail again as it is marginal.