ibm-openbmc / dev

Product Development Project Mgmt and Tracking
16 stars 2 forks source link

1050: Rainier:netipmid[567]: Failed to get bus name, path: /org/openbmc/control/chassis0, error: Input/output error #3623

Closed gtmills closed 1 year ago

gtmills commented 1 year ago

Internal defect https://jazz07.rchland.ibm.com:13443/jazz/web/projects/CSSD#action=com.ibm.team.workitem.viewWorkItem&id=430264

On 1050 tester reported: 
root@p10bmc:~# cat /etc/os-release
ID=openbmc-openpower
NAME="IBM eBMC (OpenBMC for IBM Enterprise Systems)"
VERSION="fw1050.00-2.6"
VERSION_ID=fw1050.00-2.6-1050.2311.20230309v (NL1050_004)
VERSION_CODENAME="${DISTRO_CODENAME}"
PRETTY_NAME="IBM eBMC (OpenBMC for IBM Enterprise Systems) fw1050.00-2.6"
BUILD_ID="20230301"
OPENBMC_TARGET_MACHINE="p10bmc"
EXTENDED_VERSION=NL1050_004
BMC_SIGNATURE_TYPE=Development
HOST_SIGNATURE_TYPE=Development
root@p10bmc:~# journalctl --no-pager -b | grep netipmid
Mar 10 15:21:31 p10bmc netipmid[567]: Failed to get bus name, path: /org/openbmc/control/chassis0, error: Input/output error
Mar 10 15:21:31 p10bmc netipmid[567]: Bind to interface: eth1
Mar 10 15:21:31 p10bmc netipmid[561]: Failed to get bus name, path: /org/openbmc/control/chassis0, error: Input/output error
Mar 10 15:21:31 p10bmc netipmid[561]: Bind to interface: eth0
root@p10bmc:~#

Could IPS have a look? Thanks!

lxwinspur commented 1 year ago

@gtmills We double checked this issue and there are a couple of qeustion:

  1. Are system-GUID and system-UUID the same for rainier system?
  2. org.openbmc.control.Chassis is a very old interface and has been discarded [1].
  3. I found that the logic of getting UUID through Redfish has some places
    • /redfish/v1
    • /redfish/v1/Managers/bmc
    • /redfish/v1/Chassis/
    • /redfish/v1/Systems/system
    • /redfish/v1/Systems//Processors/

[1] https://github.com/openbmc/phosphor-net-ipmid/blob/master/command/guid.cpp#L35

So, my question is: If IBM's system-GUID and system-UUID are the same, we can get the GUID by reading the bmc_persistent_data file[2], otherwise IBM needs to define a new way to get it.

[2] https://github.com/ibm-openbmc/bmcweb/blob/1050/include/persistent_data.hpp

lxwinspur commented 1 year ago

Also, I suspect that the system-UUID used by IBM should have come from VPD[1] on P9, but I did not find this key on the rainier machine.

[1] https://github.com/open-power/vpdtools/blob/master/examples/p9/openbmc/openPower_obmc_opfr_record.xml#L55-L60

Can some IBM experts confirm this problem?

lxwinspur commented 1 year ago

@mzipse FYI

lxwinspur commented 1 year ago

There are some discussions on Discord: https://discord.com/channels/775381525260664832/867820390406422538/1106044329635631215

gtmills commented 1 year ago

How did this work on 1020? Can we match that? https://github.com/openbmc/phosphor-net-ipmid/blob/master/command/guid.cpp#L35 should be moved to the correct interface and apps such as bmcweb should use it

GUID by reading the bmc_persistent_data file[2]

I find this hacky. Would prefer a D-Bus solution.

I agree with Patrick here:

And... no, you shouldn't read a bmcweb json file in another process.
You're creating an undocumented API expectation by doing that.
lxwinspur commented 1 year ago

How did this work on 1020? Can we match that? https://github.com/openbmc/phosphor-net-ipmid/blob/master/command/guid.cpp#L35 should be moved to the correct interface and apps such as bmcweb should use it

GUID by reading the bmc_persistent_data file[2]

I find this hacky. Would prefer a D-Bus solution.

Well, That's just my suggestion, and the specific implementation method should be done by IBM experts.

I agree with Patrick here:

And... no, you shouldn't read a bmcweb json file in another process.
You're creating an undocumented API expectation by doing that.
gtmills commented 1 year ago

How did this work on 1020? Can we do the same?

lxwinspur commented 1 year ago

Can we do the same?

Yes, I think it's a legacy problem, and even this problem occurs on our fp5280g2 system

mzipse commented 1 year ago

After talking with our own testers, we agree that since this problem has been there all along and this error is only logged in the journal, with no error surfaced to the customer, we are going to close this as a permanent restriction.