Closed gtmills closed 1 year ago
https://github.com/ibm-openbmc/bmcweb/pull/543. has caused the regression. At least try out all the tests or have the SME approve the changes before it is merged. The location code indicator was tested for two slots only in upstream and I think the same for downstream if I am not wrong, the issue is that the PCI slots C0 and C1 works fine , C2 is missing and C3 onwards will set the previous slot led, means if I trigger C3 then C2 gets enabled and if I trigger C4 then C3 gets enabled and so on , so the last Slot cannot be enabled , and that is why in the curl command output you see that the last is not set , ideally the other set ones are also wrong.
The ObjPath that is got from GUI and passed to the set location indicator is the issue here.
Hi @jinuthomas ,
My discord id is "Chicago Duan#3383". Can we communicate through discord? I am unable to reproduce this bug at the moment, but I have some ideas about it and I need your help with some testing.
Can you do the following two tests?
for (const auto& endpoint : endpoints)
{
if (endpoint == validChassisPath)
{
index++;
updatePcieSlotsMaps[index] = path;
}
}
Then do the same test again to see if this bug will still appear.
How was it initially tested when the upstream patch was implemented. @baemyung since you had pulled this from upstream and tested it earlier, could you please help @ChicagoDuan to fix the issue.
Isn't this issue related to 1030 PR#363 -- which seems determined as not needed?
a4990c6d66
Santosh Puranik Fix PCIe Slot Count Check (#363)
https://ibm.ent.box.com/file/1108642405190?s=meet9iczbqeuv7cmkrw0bl3ac2w8me1h
a4990c6d66 Alpana Kumari Ravindra Fix PCIe Slot Count Check (#363) This commit is for slot count correction, and 1050 code handled it different way so this correction is not required there.
It looks like that the pcie_slots code (for patch) has some issues. I'll investigate further and work on it.
There are a few issues in the code.
1) Dealing with slotmaps.
std::map<unsigned int, std::string> updatePcieSlotsMaps{};
updatePcieSlotsMaps.size()
may be changing and its size() may not match to ‘total’.// Global variable
std::map<unsigned int, std::string> updatePcieSlotsMaps{};
static void checkPCIeSlotsCount(...)
{
dbus::utility::getSubTreePaths(...[](...) {
…
unsigned int index = 0;
unsigned int count = 0;
auto slotNum = pcieSlotsPaths.size();
for (const auto& path : pcieSlotsPaths)
{
index++;
dbus::utility::getAssociationEndPoints(....
[count{++count}, slotNum,index](...) {
A) for (const auto& endpoint : endpoints)
{
if (endpoint == validChassisPath)
{
updatePcieSlotsMaps[index] = path; <==
}
}
B)
if (count == slotNum)
{
// Last time DBus has returned
if (updatePcieSlotsMaps.size() == total) <=== This may not be consitent.
{
callback(updatePcieSlotsMaps);
}
2) The comparison of ‘count’ and ‘slotNum’ seems incorrect.
However, ‘slotNum’ may not be the same as the input slot count. It is because, not all pcieslots in GetSubtree is valid.
Pcieslot12 is not a valid slot for this purpose in “association definitions”. As the result, the logic needs to filter out the invalid ones.
$ busctl get-property \
xyz.openbmc_project.Inventory.Manager \
/xyz/openbmc_project/inventory/system/chassis/motherboard/pcieslot12 \
xyz.openbmc_project.Association.Definitions Associations
Failed to get property Associations on interface xyz.openbmc_project.Association.Definitions: Unknown interface xyz.openbmc_project.Association.Definitions or property Associations.
On the other hand, the other pcieslot gives the good result
$ busctl get-property \
xyz.openbmc_project.Inventory.Manager \
/xyz/openbmc_project/inventory/system/chassis/motherboard/pcieslot10 \
xyz.openbmc_project.Association.Definitions Associations
a(sss) 4 "fault_identifying" "fault_identified_by" "/xyz/openbmc_project/led/groups/pcieslot10_fault" "identifying" "identified_by" "/xyz/openbmc_project/led/groups/pcieslot10_identify" "chassis" "inventory" "/xyz/openbmc_project/inventory/system/chassis" "upstream_processor" "pcie_slot" "/xyz/openbmc_project/inventory/system/chassis/motherboard/dcm0/cpu0"
In fact, this pcieslot12 is at the index of 21st in the slots. That’s why we’re seeing the issue at 21st entry – esp on rainer systems.
3) In addition, a few more weak things;
global variable – updatePcieSlotsMaps which seems inappropriate.
It can be fixed by using std::shared_ptr
.
the pcie slots are not sorted. So, there is a possibility that the queried-list may not match with patching indexed slot input.
TODO:
This can be addressed by getting the valid pcieSlots first, and then validate it. Once it is validated, we can perform PATCH.
setLocationIndicatorActive
) using the the obtained list.This approach will make the code cleaner and easier to read.
I'll work on this.
Thanks @baemyung this was the issue seen , thanks for working on this, not sure why we merge things in without testing it.
Thanks @baemyung this was the issue seen , thanks for working on this, not sure why we merge things in without testing it.
It was tested (though it may be a limited test) as a part of https://github.com/ibm-openbmc/bmcweb/pull/543 but apparently the logic may not give the consistent result.
PR https://github.com/ibm-openbmc/bmcweb/pull/712 is created to resolve this issue.
@gtmills Close this issue?
https://github.com/ibm-openbmc/bmcweb/pull/712 fixed. Closing
@jinuthomas FYI Internal defects are:
548223 545871 548225
Below is an Everest PATCH of all LEDs to true. You will notice not all go true. This has also been reported on Rainier. This has also been reported when moving the LocationIndicatorActive to false. This has been reported on the GUI.
21 slots:
Focus on the 21st