ibm-openbmc / dev

Product Development Project Mgmt and Tracking
16 stars 2 forks source link

1050:CT:Rainier:IPMI: SDR elist command is not showing sensor data for many components #3625

Closed yadlapati closed 1 year ago

yadlapati commented 1 year ago

Internal Defect: https://jazz07.rchland.ibm.com:13443/jazz/web/projects/CSSD#action=com.ibm.team.workitem.viewWorkItem&id=481266

SDR elist command is not showing components presence data correctly.

Problem Description

FTC1050:CT:Rainier:IPMI: SDR elist command is not showing components presence data correctly.

Steps to re-create:

  1. Power ON the Rainier system to booted state.
  2. Check IPMI's sdr elist command output - ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain40bmc sdr elist

[rahulmah@gfwa180:~]$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain40bmc sdr elist cpu0_core22 | 54h | ns | 0.0 | Disabled occ0 | 60h | ok | 0.0 | SM BIOS Uncorrectable CPU-complex Error dimm1 | 61h | ok | 0.0 | RebootAttempts | 62h | ns | 0.0 | Disabled AttemptsLeft | 63h | ok | 0.0 | gv100card0 | C5h | ns | 0.0 | Disabled fleeting0 | D0h | ns | 0.0 | Disabled


Driver details:

root@p10bmc:~# cat /etc/os-release ID=openbmc-openpower NAME="IBM eBMC (OpenBMC for IBM Enterprise Systems)" VERSION="fw1050.00-2.14" VERSION_ID=fw1050.00-2.14-1050.2313.20230329a (NL1050_008) VERSION_CODENAME="${DISTRO_CODENAME}" PRETTY_NAME="IBM eBMC (OpenBMC for IBM Enterprise Systems) fw1050.00-2.14" BUILD_ID="20231001" OPENBMC_TARGET_MACHINE="p10bmc" EXTENDED_VERSION=NL1050_008 BMC_SIGNATURE_TYPE=Development HOST_SIGNATURE_TYPE=Development

Journal log


root@p10bmc:~# journalctl -f

Mar 30 12:03:18 p10bmc ipmid[545]: failed to convert major string to uint8_t: Invalid argument Mar 30 12:03:18 p10bmc systemd-journald[255]: Forwarding to syslog missed 1 messages. Mar 30 12:03:18 p10bmc ipmid[545]: failed to convert major string to uint8_t: Invalid argument Mar 30 12:03:19 p10bmc netipmid[1593]: Removing idle IPMI LAN session, id: 874376998, handler: 1

This issue is specific to 1050 build. With 1040 build, we see all below components as part of sdr elist command.

$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc sdr elist Password: dimm0 | 01h | ok | 32.1 | Presence Detected dimm1 | 02h | ok | 32.2 | Presence Detected dimm2 | 03h | ok | 32.3 | Presence Detected dimm3 | 04h | ok | 32.4 | Presence Detected dimm4 | 05h | ok | 32.5 | Memory Device Disabled, Presence Detected dimm5 | 06h | ok | 32.6 | Memory Device Disabled, Presence Detected dimm6 | 07h | ok | 32.7 | Memory Device Disabled, Presence Detected dimm7 | 08h | ok | 32.8 | Memory Device Disabled, Presence Detected dimm8 | 09h | ok | 32.9 | Presence Detected dimm9 | 0Ah | ok | 32.10 | Presence Detected dimm10 | 0Bh | ok | 32.11 | Presence Detected dimm11 | 0Ch | ok | 32.12 | Presence Detected dimm12 | 0Dh | ok | 32.13 | Memory Device Disabled, Presence Detected dimm13 | 0Eh | ok | 32.14 | Memory Device Disabled, Presence Detected dimm14 | 0Fh | ok | 32.15 | Memory Device Disabled, Presence Detected dimm15 | 10h | ok | 32.16 | Memory Device Disabled, Presence Detected dimm16 | 11h | ok | 32.17 | Presence Detected dimm17 | 12h | ok | 32.18 | Presence Detected dimm18 | 13h | ok | 32.19 | Presence Detected dimm19 | 14h | ok | 32.20 | Presence Detected dimm20 | 15h | ok | 32.21 | Memory Device Disabled, Presence Detected dimm21 | 16h | ok | 32.22 | Memory Device Disabled, Presence Detected dimm22 | 17h | ok | 32.23 | Memory Device Disabled, Presence Detected dimm23 | 18h | ok | 32.24 | Memory Device Disabled, Presence Detected dimm24 | 19h | ok | 32.25 | Presence Detected dimm25 | 1Ah | ok | 32.26 | Presence Detected dimm26 | 1Bh | ok | 32.27 | Presence Detected dimm27 | 1Ch | ok | 32.28 | Presence Detected dimm28 | 1Dh | ok | 32.29 | Memory Device Disabled, Presence Detected dimm29 | 1Eh | ok | 32.30 | Memory Device Disabled, Presence Detected dimm30 | 1Fh | ok | 32.31 | Memory Device Disabled, Presence Detected dimm31 | 20h | ok | 32.32 | Memory Device Disabled, Presence Detected dimm32 | 21h | ns | 32.33 | Disabled dimm33 | 22h | ns | 32.34 | Disabled dimm34 | 23h | ns | 32.35 | Disabled dimm35 | 24h | ns | 32.36 | Disabled dimm36 | 25h | ns | 32.37 | Disabled dimm37 | 26h | ns | 32.38 | Disabled dimm38 | 27h | ns | 32.39 | Disabled dimm39 | 28h | ns | 32.40 | Disabled dimm40 | 29h | ns | 32.41 | Disabled dimm41 | 2Ah | ns | 32.42 | Disabled dimm42 | 2Bh | ns | 32.43 | Disabled dimm43 | 2Ch | ns | 32.44 | Disabled dimm44 | 2Dh | ns | 32.45 | Disabled dimm45 | 2Eh | ns | 32.46 | Disabled dimm46 | 2Fh | ns | 32.47 | Disabled dimm47 | 30h | ns | 32.48 | Disabled dimm48 | 31h | ns | 32.49 | Disabled dimm49 | 32h | ns | 32.50 | Disabled dimm50 | 33h | ns | 32.51 | Disabled dimm51 | 34h | ns | 32.52 | Disabled dimm52 | 35h | ns | 32.53 | Disabled dimm53 | 36h | ns | 32.54 | Disabled dimm54 | 37h | ns | 32.55 | Disabled dimm55 | 38h | ns | 32.56 | Disabled dimm56 | 39h | ns | 32.57 | Disabled dimm57 | 3Ah | ns | 32.58 | Disabled dimm58 | 3Bh | ns | 32.59 | Disabled dimm59 | 3Ch | ns | 32.60 | Disabled dimm60 | 3Dh | ns | 32.61 | Disabled dimm61 | 3Eh | ns | 32.62 | Disabled dimm62 | 3Fh | ns | 32.63 | Disabled dimm63 | 40h | ns | 32.64 | Disabled dcm0_cpu0 | 41h | ok | 3.1 | Presence detected dcm0_cpu1 | 42h | ok | 3.2 | Presence detected dcm1_cpu0 | 43h | ok | 3.3 | Presence detected dcm1_cpu1 | 44h | ok | 3.4 | Presence detected dcm2_cpu0 | 45h | ns | 3.5 | Disabled dcm2_cpu1 | 46h | ns | 3.6 | Disabled dcm3_cpu0 | 47h | ns | 3.7 | Disabled dcm3_cpu1 | 48h | ns | 3.8 | Disabled BootProgress | 4Bh | ok | 34.1 | auto_reboot | 4Ch | ok | 35.1 | fan0_0 | 4Dh | ok | 39.1 | 7300 RPM fan0_1 | 4Eh | ok | 39.2 | 9700 RPM fan1_0 | 4Fh | ok | 39.3 | 7300 RPM fan1_1 | 50h | ok | 39.4 | 9700 RPM fan2_0 | 51h | ok | 39.5 | 7400 RPM fan2_1 | 52h | ok | 39.6 | 9900 RPM fan3_0 | 53h | ok | 39.7 | 7400 RPM fan3_1 | 54h | ok | 39.8 | 10000 RPM fan4_0 | 55h | ok | 39.9 | 7400 RPM fan4_1 | 56h | ok | 39.10 | 10000 RPM fan5_0 | 57h | ok | 39.11 | 7300 RPM fan5_1 | 58h | ok | 39.12 | 9700 RPM Ambient_0_Temp | 59h | ok | 40.1 | 20 degrees C Ambient_1_Temp | 5Ah | ok | 40.2 | 21 degrees C Ambient_2_Temp | 5Bh | ok | 40.3 | 20 degrees C PCIE_0_Temp | 5Ch | ok | 41.1 | 26 degrees C PCIE_1_Temp | 5Dh | ok | 41.2 | 27 degrees C proc0_core0_0_te | 5Eh | ok | 42.1 | 37 degrees C proc0_core0_1_te | 5Fh | ok | 42.2 | 36 degrees C proc1_core0_0_te | 60h | ns | 42.3 | Disabled proc1_core0_1_te | 61h | ns | 42.4 | Disabled proc2_core0_0_te | 62h | ns | 42.5 | Disabled proc2_core0_1_te | 63h | ns | 42.6 | Disabled proc3_core0_0_te | 64h | ns | 42.7 | Disabled proc3_core0_1_te | 65h | ns | 42.8 | Disabled proc4_core0_0_te | 66h | ns | 42.9 | Disabled proc4_core0_1_te | 67h | ns | 42.10 | Disabled proc5_core0_0_te | 68h | ns | 42.11 | Disabled proc5_core0_1_te | 69h | ns | 42.12 | Disabled proc6_core0_0_te | 6Ah | ns | 42.13 | Disabled proc6_core0_1_te | 6Bh | ns | 42.14 | Disabled proc7_core0_0_te | 6Ch | ns | 42.15 | Disabled proc7_core0_1_te | 6Dh | ns | 42.16 | Disabled dimm0_dram_temp | 6Eh | ok | 43.1 | 31 degrees C dimm1_dram_temp | 6Fh | ok | 43.2 | 30 degrees C dimm2_dram_temp | 70h | ok | 43.3 | 28 degrees C dimm3_dram_temp | 71h | ok | 43.4 | 27 degrees C dimm4_dram_temp | 72h | ns | 43.5 | Disabled dimm5_dram_temp | 73h | ns | 43.6 | Disabled dimm6_dram_temp | 74h | ns | 43.7 | Disabled dimm7_dram_temp | 75h | ns | 43.8 | Disabled dimm8_dram_temp | 76h | ok | 43.9 | 30 degrees C dimm9_dram_temp | 77h | ok | 43.10 | 32 degrees C dimm10_dram_temp | 78h | ok | 43.11 | 30 degrees C dimm11_dram_temp | 79h | ok | 43.12 | 26 degrees C dimm12_dram_temp | 7Ah | ns | 43.13 | Disabled dimm13_dram_temp | 7Bh | ns | 43.14 | Disabled dimm14_dram_temp | 7Ch | ns | 43.15 | Disabled dimm15_dram_temp | 7Dh | ns | 43.16 | Disabled dimm16_dram_temp | 7Eh | ok | 43.17 | 31 degrees C dimm17_dram_temp | 7Fh | ok | 43.18 | 31 degrees C dimm18_dram_temp | 80h | ok | 43.19 | 32 degrees C dimm19_dram_temp | 81h | ok | 43.20 | 25 degrees C dimm20_dram_temp | 82h | ns | 43.21 | Disabled dimm21_dram_temp | 83h | ns | 43.22 | Disabled dimm22_dram_temp | 84h | ns | 43.23 | Disabled dimm23_dram_temp | 85h | ns | 43.24 | Disabled dimm24_dram_temp | 86h | ok | 43.25 | 30 degrees C dimm25_dram_temp | 87h | ok | 43.26 | 29 degrees C dimm26_dram_temp | 88h | ok | 43.27 | 27 degrees C dimm27_dram_temp | 89h | ok | 43.28 | 27 degrees C dimm28_dram_temp | 8Ah | ns | 43.29 | Disabled dimm29_dram_temp | 8Bh | ns | 43.30 | Disabled dimm30_dram_temp | 8Ch | ns | 43.31 | Disabled dimm31_dram_temp | 8Dh | ns | 43.32 | Disabled dimm32_dram_temp | 8Eh | ns | 43.33 | Disabled dimm33_dram_temp | 8Fh | ns | 43.34 | Disabled dimm34_dram_temp | 90h | ns | 43.35 | Disabled dimm35_dram_temp | 91h | ns | 43.36 | Disabled dimm36_dram_temp | 92h | ns | 43.37 | Disabled dimm37_dram_temp | 93h | ns | 43.38 | Disabled dimm38_dram_temp | 94h | ns | 43.39 | Disabled dimm39_dram_temp | 95h | ns | 43.40 | Disabled dimm40_dram_temp | 96h | ns | 43.41 | Disabled dimm41_dram_temp | 97h | ns | 43.42 | Disabled dimm42_dram_temp | 98h | ns | 43.43 | Disabled dimm43_dram_temp | 99h | ns | 43.44 | Disabled dimm44_dram_temp | 9Ah | ns | 43.45 | Disabled dimm45_dram_temp | 9Bh | ns | 43.46 | Disabled dimm46_dram_temp | 9Ch | ns | 43.47 | Disabled dimm47_dram_temp | 9Dh | ns | 43.48 | Disabled dimm48_dram_temp | 9Eh | ns | 43.49 | Disabled dimm49_dram_temp | 9Fh | ns | 43.50 | Disabled dimm50_dram_temp | A0h | ns | 43.51 | Disabled dimm51_dram_temp | A1h | ns | 43.52 | Disabled dimm52_dram_temp | A2h | ns | 43.53 | Disabled dimm53_dram_temp | A3h | ns | 43.54 | Disabled dimm54_dram_temp | A4h | ns | 43.55 | Disabled dimm55_dram_temp | A5h | ns | 43.56 | Disabled dimm56_dram_temp | A6h | ns | 43.57 | Disabled dimm57_dram_temp | A7h | ns | 43.58 | Disabled dimm58_dram_temp | A8h | ns | 43.59 | Disabled dimm59_dram_temp | A9h | ns | 43.60 | Disabled dimm60_dram_temp | AAh | ns | 43.61 | Disabled dimm61_dram_temp | ABh | ns | 43.62 | Disabled dimm62_dram_temp | ACh | ns | 43.63 | Disabled dimm63_dram_temp | ADh | ns | 43.64 | Disabled

lxwinspur commented 1 year ago
[rahulmah@gfwa180:~]$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain40bmc sdr elist
cpu0_core22 | 54h | ns | 0.0 | Disabled
occ0 | 60h | ok | 0.0 | SM BIOS Uncorrectable CPU-complex Error
dimm1 | 61h | ok | 0.0 |
RebootAttempts | 62h | ns | 0.0 | Disabled
AttemptsLeft | 63h | ok | 0.0 |
gv100card0 | C5h | ns | 0.0 | Disabled
fleeting0 | D0h | ns | 0.0 | Disabled

@yadlapati Could you the result of the command below on 1050: busctrl introspect xyz.openbmc_project.Inventory.Manager /xyz/openbmc_project/inventory/system/chassis/motherboard/dimm1

lxwinspur commented 1 year ago

@yadlapati @mzipse Also, what is the commitID of the openbmc that you use the 1050 branch?

lxwinspur commented 1 year ago

@ChicagoDuan FYI

yadlapati commented 1 year ago

I flashed latest 1050 code and still seeing the issue

cat /etc/os-release

ID=openbmc-openpower NAME="IBM eBMC (OpenBMC for IBM Enterprise Systems)" VERSION="fw1050.00-2.45" VERSION_ID=fw1050.00-2.45-1050.2321.20230516a (NL1050_014) VERSION_CODENAME="${DISTRO_CODENAME}" PRETTY_NAME="IBM eBMC (OpenBMC for IBM Enterprise Systems) fw1050.00-2.45" BUILD_ID="20231001"

$ ipmitool -I lanplus -C 17 -p 623 -U ipmiadmin -H rain104bmc sdr elist cpu0_core22 | 54h | ns | 0.0 | Disabled occ0 | 60h | ok | 0.0 | dimm1 | 61h | ok | 0.0 | RebootAttempts | 62h | ns | 0.0 | Disabled AttemptsLeft | 63h | ok | 0.0 | gv100card0 | C5h | ns | 0.0 | Disabled fleeting0 | D0h | ns | 0.0 | Disabled

yadlapati commented 1 year ago

@lxwinspur output you requested:

:~# busctl introspect xyz.openbmc_project.Inventory.Manager /xyz/openbmc_project/inventory/system/chassis/motherboard/dimm1 NAME TYPE SIGNATURE RESULT/VALUE FLAGS com.ibm.ipzvpd.Location interface - - - .LocationCode property s "U78DA.ND0.WZS003T-P0-C13" emits-change writable org.freedesktop.DBus.Introspectable interface - - - .Introspect method - s - org.freedesktop.DBus.Peer interface - - - .GetMachineId method - s - .Ping method - - - org.freedesktop.DBus.Properties interface - - - .Get method ss v - .GetAll method s a{sv} - .Set method ssv - - .PropertiesChanged signal sa{sv}as - - xyz.openbmc_project.Association.Definitions interface - - - .Associations property a(sss) 2 "fault_identifying" "fault_identifi... emits-change writable xyz.openbmc_project.Inventory.Decorator.LocationCode interface - - - .LocationCode property s "U78DA.ND0.WZS003T-P0-C13" emits-change writable xyz.openbmc_project.Inventory.Item interface - - - .Present property b false emits-change writable .PrettyName property s "Memory DIMM" emits-change writable xyz.openbmc_project.Inventory.Item.Dimm interface - - - .AllowedSpeedsMT property aq 0 emits-change writable .CASLatencies property q 0 emits-change writable .ECC property s "xyz.openbmc_project.Inventory.Item.D... emits-change writable .FormFactor property s "xyz.openbmc_project.Inventory.Item.D... emits-change writable .MaxMemorySpeedInMhz property q 0 emits-change writable .MemoryAttributes property y 0 emits-change writable .MemoryConfiguredSpeedInMhz property q 0 emits-change writable .MemoryDataWidth property q 0 emits-change writable .MemoryDeviceLocator property s "" emits-change writable .MemoryMedia property s "xyz.openbmc_project.Inventory.Item.D... emits-change writable .MemorySizeInKB property u 0 emits-change writable .MemoryTotalWidth property q 0 emits-change writable .MemoryType property s "xyz.openbmc_project.Inventory.Item.D... emits-change writable .MemoryTypeDetail property s "" emits-change writable .RevisionCode property q 0 emits-change writable xyz.openbmc_project.Object.Enable interface - - - .Enabled property b true emits-change writable xyz.openbmc_project.State.Decorator.OperationalStatus interface - - - .Functional property b true emits-change writable

lxwinspur commented 1 year ago

I flashed latest 1050 code and still seeing the issue

cat /etc/os-release

ID=openbmc-openpower NAME="IBM eBMC (OpenBMC for IBM Enterprise Systems)" VERSION="fw1050.00-2.45" VERSION_ID=fw1050.00-2.45-1050.2321.20230516a (NL1050_014) VERSION_CODENAME="${DISTRO_CODENAME}" PRETTY_NAME="IBM eBMC (OpenBMC for IBM Enterprise Systems) fw1050.00-2.45" BUILD_ID="20231001"

$ ipmitool -I lanplus -C 17 -p 623 -U ipmiadmin -H rain104bmc sdr elist cpu0_core22 | 54h | ns | 0.0 | Disabled occ0 | 60h | ok | 0.0 | dimm1 | 61h | ok | 0.0 | RebootAttempts | 62h | ns | 0.0 | Disabled AttemptsLeft | 63h | ok | 0.0 | gv100card0 | C5h | ns | 0.0 | Disabled fleeting0 | D0h | ns | 0.0 | Disabled

It seems that the correct sensor configuration is not loaded here, but only sensor-example is used. Could you check if your configuration is correct?

https://github.com/ibm-openbmc/phosphor-host-ipmid/blob/master/scripts/sensor-example.yaml

mzipse commented 1 year ago

@lxwinspur , where does this sensor config information get loaded from in the BMC? Can you tell? @jinuthomas , does the sensor config get set by the VPD code? I may be off base on this. Just trying to keep this moving along.

jinuthomas commented 1 year ago

Nope the VPD code does not do that.

spinler commented 1 year ago

They're hardcoded in yaml files.

gtmills commented 1 year ago

Looks like this file https://github.com/ibm-openbmc/openbmc/commit/4c8f4b6e43a6634d520336bcfc244a07d84cf6cf Something change between 1020 and 1050 ?

lxwinspur commented 1 year ago

@gtmills @spinler @mzipse @yadlapati @jinuthomas Review by: https://github.com/ibm-openbmc/openbmc/pull/285

Please cherry-pick this patch and then test this issue synchronously: https://github.com/ibm-openbmc/dev/issues/3630

lxwinspur commented 1 year ago

@ChicagoDuan FYI ^

gtmills commented 1 year ago

Reed merged https://github.com/ibm-openbmc/openbmc/pull/285 in GHE