ibm-openbmc / openbmc

https://github.com
Other
19 stars 51 forks source link

1030.ips:Host failed to power on #263

Closed lxwinspur closed 1 year ago

lxwinspur commented 1 year ago

We encountered the problem of Host Power On failure on a 2U Rainier machine, the attachment is the related log and dump file:

sbedump.tar.gz FailedToStart_2-0208.log os-release

We consulted @ojayanth about this problem, and after analysis, we think that the problem should appear in SBE SEEPROM images. We've tried switching to an alternate SBE image test, and the Host power on was successful.

Please someone from IBM's SBE team help to analyze the reason, thanks!

lxwinspur commented 1 year ago

@mzipse FYI

skumar8j commented 1 year ago

Can I get the info which build System was using?

skumar8j commented 1 year ago

I do not see the build level in the info.yaml.

generation: p10 driver: none

I am able to parse the first pibmem dump with M seeprom 2.14 and other two dumps with M seeprom 2.13. Maybe side switch happened after the failure.

00000000.202775054| 0|SBE_TRACE | 1|I> tpmExtendPCR TPM2_Extend PCR response code is 0x00000000 00000000.202776355| 0|SBE_TRACE | 1|I>Writing extendSecurityStatePCR1 details [0x01140100 0x00020014] into Register [0x00010012] 00000000.202798054| 0|SBE_TRACE | 1|I> sbemthreadroutine Verification Image found in the Boot Seeprom Image 00000000.202840355| 0|SBE_TRACE | 1|I>Verification Image Source addr in Boot Seeprom is [0x4783FEA9]

After this, there is Boot seeprom access from M Seeprom. I need the B Seeprom version to parse and debug the PPE State.

skumar8j commented 1 year ago

Can we run the screener code to figure out if there is a CHIP SELECT issue?

lxwinspur commented 1 year ago

Can I get the info which build System was using?

https://github.com/ibm-openbmc/openbmc/tree/1030.00.ips

lxwinspur commented 1 year ago

Can I get the info which build System was using?

https://github.com/open-power/sbe/commits/master-p10 CommitId: c74232131c91c41b418e711f6fc181ff3b881d7a

lxwinspur commented 1 year ago

@mzipse @skumar8j Can the dump analysis tool be shared with IPS?

Grubby0624 commented 1 year ago

After this, there is Boot seeprom access from M Seeprom. I need the B Seeprom version to parse and debug the PPE State. Do you mean the image version in the standby SBE seeprom? I think this version should be 2.14. At the last successful boot, both seeproms were upgraded to 2.14

skumar8j commented 1 year ago

It can be shared with IPS. The tools are already in the github But it need the stringFiles and symbol files apart from some other script.

lxwinspur commented 1 year ago

It can be shared with IPS. The tools are already in the github But it need the stringFiles and symbol files apart from some other script.

Thanks very much, so could you kindly share the github link and how to use it?

skumar8j commented 1 year ago

https://github.com/open-power/sbe/blob/master/src/tools/debug/sbe-debug.py

./sbe-debug.py -e ./sbe-debug.py -l trace -t FILE -f ./sbe-debug.py -l ppestate -t FILE -f

lxwinspur commented 1 year ago

https://github.com/open-power/sbe/blob/master/src/tools/debug/sbe-debug.py

./sbe-debug.py -e ./sbe-debug.py -l trace -t FILE -f ./sbe-debug.py -l ppestate -t FILE -f

./sbe-debug.py -l trace -t FILE -f SYSDUMP.783C4C1.30000001.19700101004254 ERROR: file sbe_DD2.syms not found

Do you know why error? maybe I missed something?

skumar8j commented 1 year ago

The script needs the symbol file , string file, ppetool and fsp-trace tool.

Symbol and string file will be present in the images directory after the SBE build.

lxwinspur commented 1 year ago

@sampmisr FYI

sampmisr commented 1 year ago

As per IPS the SBE debug tool is working fine now. But some clue is needed about the corrupted SBE image.

skumar8j commented 1 year ago

Measurement SBE trying consecutive SPI reads to read the verification image and offset. The second read to get the verification image offset is not correct (0x4783FEA9).

Can some run the manufacturing SPI screener code to detect is the chip has CHIP SELECT ISSUE.

lxwinspur commented 1 year ago

Measurement SBE trying consecutive SPI reads to read the verification image and offset. The second read to get the verification image offset is not correct (0x4783FEA9).

Where is the code for this logic

Can some run the manufacturing SPI screener code to detect is the chip has CHIP SELECT ISSUE.

manufacturing SPI screener code, Is this a tool, how do we get this?

Sorry, I am not familiar with SBE.

lili-lilili commented 1 year ago

https://github.com/open-power/sbe/blob/master/src/tools/debug/sbe-debug.py

./sbe-debug.py -e ./sbe-debug.py -l trace -t FILE -f ./sbe-debug.py -l ppestate -t FILE -f

I tried to parse SBE trace, but encountered some problems. Some error report here: https://github.com/open-power/sbe/blob/c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbe-debug.py#L178 Is the tool version on github consistent with the one you @skumar8j

Here is the log: li@Ubuntu-li:~/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug$ ./sbe-debug.py -l trace -t FILE -r System_Dump_Entry_SBE_30000004 -f plat_dump/30000004.0_0_SbeData_p10_p10_pibmem_dump

Symbol File: [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbe_DD1.syms] Parsing the Dump header Missing section indicator is 0x0 BMC System: 1 Running command [hexdump -v -e '1/8 "%016x"' -e '"\n"' plat_dump/30000004.0_0_SbeData_p10_p10_pibmem_dump| xxd -r -p > output_file] hexdump: x: bad byte count Running command [cp output_file plat_dump/30000004.0_0_SbeData_p10_p10_pibmem_dump]

Trace buffer symbol addr: [fffd2e80] Trace Buffer Length: [00000838]

String File: [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbeStringFile_DD1] Running command [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/ppe2fsp /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/DumpPIBMEM /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbetrace.bin ] Failed converting ppe trace to fsp trace. rc = 6 PPE trace buffer must be version 2. ERROR running command: 1536

fsp-trace: [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/fsp-trace] Running command [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/fsp-trace -s /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbeStringFile_DD1 /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbetrace.bin > /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbe_0_0_tracMERG] fsp-trace.c is_smartDump [503]: read 40 bytes of /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbetrace.bin = 0, 19: No such device fsp-trace.c parse_opt: file /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbetrace.bin: not an fsp-trace file (Incorrect Version?) adal_parse.c trace_adal_read_stringfile: stringfile magic cookie not found or corrupted. fsp-trace.c read_stringfiles: cannot read stringfile '/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbeStringFile_DD1' ERROR running command: 512 Running command [mv /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/DumpPIBMEM /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/dumpPibMem_trace]

edwin-wang commented 1 year ago

@skumar8j Could you help check above question?

skumar8j commented 1 year ago

Failed converting ppe trace to fsp trace. rc = 6 PPE trace buffer must be version 2. ERROR running command: 1536

This issue will come only if there is a mismatch between the SBE image which is loaded and the symbol files used to extract the pibmem dump. Use the symbol files for the image which is loaded in PIBMEM.

Other way is you can use -forcedtrace option

lili-lilili commented 1 year ago

@skumar8j I get the sbe-debug tool, symbol file and dump file from the same op-build project. So there should be no version mismatch.

I think it may be the problem caused by the error reported below. The failed hexdump will make output file size to be zero, and it will override 30000001.0_0_SbeData_p10_p10_pibmem_dump.

BMC System: 1 Running command [hexdump -v -e '1/8 "%016x"' -e '"\n"' plat_dump/30000001.0_0_SbeData_p10_p10_pibmem_dump| xxd -r -p > output_file] hexdump: x: bad byte count Running command [cp output_file plat_dump/30000004.0_0_SbeData_p10_p10_pibmem_dump]

Here is the log with forcedtrace ops, it also report some error.

li@Ubuntu-li:~/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug$ ./sbe-debug.py -l forced-trace -t FILE -r System_Dump_Entry_SBE_30000001 -f plat_dump/30000001.0_0_SbeData_p10_p10_pibmem_dump

Symbol File: [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbe_DD1.syms] Parsing the Dump header Missing section indicator is 0x0 BMC System: 1 Running command [hexdump -v -e '1/8 "%016x"' -e '"\n"' plat_dump/30000001.0_0_SbeData_p10_p10_pibmem_dump| xxd -r -p > output_file] hexdump: x: bad byte count Running command [cp output_file plat_dump/30000001.0_0_SbeData_p10_p10_pibmem_dump]

String File: [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbeStringFile_DD1]

Forced Trace, Pibmem Offset: [0x00] Pibmem Length: [0x7D400] Traceback (most recent call last): File "./sbe-debug.py", line 874, in main( sys.argv ) File "./sbe-debug.py", line 841, in main forcedCollectTrace(sbe_string_file,sbe_tracMERG_file) File "./sbe-debug.py", line 225, in forcedCollectTrace data_read = data_read[1:]+[ord(byte)] TypeError: ord() expected a character, but string of length 0 found

li@Ubuntu-li:~/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug$ li@Ubuntu-li:~/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug$ ll total 1784 drwxr-xr-x 4 li li 4096 3月 21 15:13 ./ drwxr-xr-x 12 li li 4096 11月 24 21:02 ../ -rw-rw-r-- 1 li li 0 3月 21 15:13 DumpPIBMEM -rw-r--r-- 1 li li 4192 11月 24 21:02 extractMpiplDump.py -rwxr-xr-x 1 li li 338480 3月 21 11:31 fsp-trace -rwxr-xr-x 1 li li 24576 11月 24 21:02 hbbl.bin -rw-r--r-- 1 li li 202 1月 1 1970 info.yaml -rw-rw-r-- 1 li li 20857 3月 21 15:13 output.binary -rw-rw-r-- 1 li li 0 3月 21 15:13 output_file drwxrwxr-x 2 li li 4096 3月 21 15:13 plat_dump/ -rwxr-xr-x 1 li li 28312 3月 21 11:03 ppe2fsp -rwxr-xr-x 1 li li 638944 3月 21 11:04 ppetracepp -rw-r--r-- 1 li li 141556 3月 21 11:01 sbe_DD1.syms -rwxr-xr-x 1 li li 34926 3月 21 14:26 sbe-debug-hexdump.py -rwxr-xr-x 1 li li 34928 3月 21 11:27 sbe-debug.py -rwxr-xr-x 1 li li 34960 3月 21 11:29 sbe-debug.py-f -rw-r--r-- 1 li li 33368 3月 21 13:59 sbe_measurement_seeprom.bin -rw-r--r-- 1 li li 21376 3月 21 13:53 sbe_measurement_seeprom.syms -rw-r--r-- 1 li li 165937 3月 21 13:54 sbeMeasurementStringFile -rwxr-xr-x 1 li li 7574 11月 24 21:02 sbeModifyPGvalue.py -rw-rw-r-- 1 li li 157377 3月 21 11:06 sbeStringFile_DD1 -rwxr-xr-x 1 li li 11850 3月 21 12:02 signSbeImage drwxr-xr-x 2 li li 4096 11月 24 21:02 simics/ -rwxr-xr-x 1 li li 14182 11月 24 21:02 simics-debug-framework.py -rwxr-xr-x 1 li li 10540 11月 24 21:02 simics-debug-framework_rainier.py* -rw-r--r-- 1 li li 22089 3月 21 11:00 System_Dump_Entry_SBE_30000001 -rw-r--r-- 1 li li 22089 3月 21 10:58 System_Dump_Entry_SBE_30000001.bk li@Ubuntu-li:~/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug$

edwin-wang commented 1 year ago

This issue will come only if there is a mismatch between the SBE image which is loaded and the symbol files used to extract the pibmem dump.

Seems SBE image and the symbol files came from the same project. Could you help to check any insight with sbe-debug.py or op-build project? @skumar8j

lili-lilili commented 1 year ago

Because the sbe dump parsing problem is essentially unrelated to this issue, let's open a new issue to discuss it. https://github.com/ibm-openbmc/openbmc/issues/279