open-power / pdbg

PowerPC FSI Debugger
Apache License 2.0
16 stars 39 forks source link

[witherspoon VERSION="v1.99.4-47"] Failed to inject L2FIR checkstop on BMC with pdgb tool #18

Closed SrideviRamesh closed 7 years ago

SrideviRamesh commented 7 years ago

Issue: Failed to inject L2FIR checkstop on witherspoon BMC with pdgb tool

system Info:

cat /etc/os-release ID="openbmc-phosphor" NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)" VERSION="v1.99.4-47" VERSION_ID="v1.99.4-47-g113a5c8" PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) v1.99.4-47" BUILD_ID="v1.99.4" root@witherspoon:~/bin#

procedure:

Read mask, action-0, action-1 FIR values on BMC with pdgb tool

root@witherspoon:~/bin# ./pdbg -b fsi -d p9w -p0 getscom 0x1010800 -p1
p0:0x1010800: 0x000003e000000100
p1:0x1010800: 0x000003e000000100
root@witherspoon:~/bin# ./pdbg -b fsi -d p9w -p0 getscom 0x1010803 -p1
p0:0x1010803: 0xffffffffffffffff
p1:0x1010803: 0xffffffffffffffff
root@witherspoon:~/bin# ./pdbg -b fsi -d p9w -p0 getscom 0x1010806 -p1
p0:0x1010806: 0x0000000000000000
p1:0x1010806: 0x0000000000000000
root@witherspoon:~/bin# ./pdbg -b fsi -d p9w -p0 getscom 0x1010807 -p1
p0:0x1010807: 0x0000000000000000
p1:0x1010807: 0x0000000000000000
root@witherspoon:~/bin#

perform putscom operation with L2FIR and read back the action-0, action-1 values

root@witherspoon:~/bin#  ./pdbg -b fsi -d p9w -p0 putscom 0x1010800 0x8000000000000000
p0:0x1010800: 0x8000000000000000
root@witherspoon:~/bin#
root@witherspoon:~/bin#
root@witherspoon:~/bin# ./pdbg -b fsi -d p9w -p0 getscom 0x1010806 -p1
p0:0x1010806: 0x0000000000000000
p1:0x1010806: 0x0000000000000000
root@witherspoon:~/bin# ./pdbg -b fsi -d p9w -p0 getscom 0x1010807 -p1
p0:0x1010807: 0xc29058a809000000
p1:0x1010807: 0x0000000000000000
root@witherspoon:~/bin#

For reference scom values taken from p9 zz

$ getscom pu.ex 000000010010800 -all
p9n.ex  k0:n0:s0:p00:c3    0x0000000000000000
p9n.ex  k0:n0:s0:p00:c5    0x0000000000000000
ecmd_ppc getscom pu.ex 000000010010800 -all
$ getscom pu.ex 000000010010803 -all
p9n.ex  k0:n0:s0:p00:c3    0x3C200557F6C00000
p9n.ex  k0:n0:s0:p00:c5    0x3C200557F6C00000
ecmd_ppc getscom pu.ex 000000010010803 -all
$ getscom pu.ex 000000010010806 -all
p9n.ex  k0:n0:s0:p00:c3    0x0000000000000000
p9n.ex  k0:n0:s0:p00:c5    0x0000000000000000
ecmd_ppc getscom pu.ex 000000010010806 -all
$ getscom pu.ex 000000010010807 -all
p9n.ex  k0:n0:s0:p00:c3    0xC29058A809000000
p9n.ex  k0:n0:s0:p00:c5    0xC29058A809000000
ecmd_ppc getscom pu.ex 000000010010807 -all
$

Logs attached dmesg.txt journal_logs.txt 20170421063812298062_MyFFDCLogs.zip

williamspatrick commented 7 years ago

pdbg does not know about core-register address translation or core-ids, to the best of my knowledge. It looks like you are effectively doing a putscom to core0. Are you sure this is a valid core on the processors you are touching? There is no -call like option for pdbg.

lkammath commented 7 years ago

Hi Patric, Just tried to play with act0 and act1 register by having the same value on ZZ system to see if setting any bit triggers checkstop. It did not and also values in the mask register indicate all the bits are masked

williamspatrick commented 7 years ago

So is there any issue? It doesn't seem like it to me.

lkammath commented 7 years ago

Hi Patrick,

I am trying with -b fsi option that seems to return me some values on getscom but looks like after i try to do putscom it doesn't get reflect and after that basic getcfam is failing that mean -b fsi is leaving the BMC in some weired state and i have to reboot to again get back to normal state

Sequence of operations i did

root@witherspoon:~/pdgb/bin# ./pdbg  getcfam 0x2809 -p0
p0:0x2809: 0x0000000085405080

without using -b fsi

root@witherspoon:~/pdgb/bin# ./pdbg getscom 0x20010A40 -p0
pdbg: Failed to read from 0x0000000020010a40: Invalid argument  
getaddr: Error reading register   ---> When -b fsi is not used then i am getting error

with -b fsi option it returns back some value root@witherspoon:~/pdgb/bin# ./pdbg getscom -b fsi 0x20010A40 -p0 p0:0x20010a40: 0x0000000000000000

followed putscom

./pdbg -b fsi putscom 0x20010A40 0x0002000000000000 -p0 -c0 p0:0x20010a40: 0x0002000000000000

followed getscom

root@witherspoon:~/pdgb/bin# ./pdbg -b fsi getscom 0x20010A40 -p0

p0:0x20010a40: 0x0000000000000000 
p0:0x20010a40: 0x0000000000000000 

After this i do getcfam or probe nothing gets returned. System gets into bad state and even cronus reflects same .

# ./pdbg  getcfam 0x2809 -p0
~/pdgb/bin#
geissonator commented 7 years ago

With systems supporting openfsi (i.e. witherspoon, zaius, romulus) - don't use the "-b fsi" option. This uses the old FSI bit banging and can mess up the openfsi driver. Just use the defaults. I usually do "pdbg -p0 getcfam 0x2809" and all works good for me on witherspoon systems.

lkammath commented 7 years ago

getcfam & putcfam is ok. The problem here is using getscom with -b fsi . It just throws weired error . May be the tool is really half cooked . Will need to debug that

apopple commented 7 years ago

@lkammath @williamspatrick and @geissonator are correct - you probably shouldn't be using the "-b fsi" option as it bypasses the in kernel driver. However the fail above doesn't seem to be related to any of the tools - you are writing a bunch of SCOM registers and eventually break the system, including for Cronus.

So you probably need to check that the sequence of register writes is valid. I would note 0x20010a40 is not a valid address unless core-0 is present, so perhaps your getting the system into a bad state by accessing a non-existent SCOM address. I'm going to close this for now.