When a CEC FRU such as a Fan or Power Supply is concurrently removed, BMC properly sends pdrRepoChangeEvents to PHYP to remove the FRU Record Set ID and to modify the Entity Association PDRs. BMC also removes the corresponding state sensors/ effecters PDRs. However, BMC does not remove the FRU entry from the FRU Record Table.
When BMC adds a new FRU, BMC does send new FRU Record with the new FRU RSI and also adds the same to the FRU record table. Thus, there is a stale entry of the old FRU in the FRU record table and thus the size of the FRU record table increases.
This is not a big deal for a small number of FRU operations, and host behaves correctly. However, at scale it causes problems
I believe PHYP has a finite memory to hold the FRU record table and if the FRU entry is not deleted then, PHYP might run out of memory. This may lead PHYP to crash.
This issue can be tested by doing a CM operation (removing a fru and adding it back) and check the FRU table contents before and after the CM. The CM operation can be performed by busctl commands.
When a CEC FRU such as a Fan or Power Supply is concurrently removed, BMC properly sends pdrRepoChangeEvents to PHYP to remove the FRU Record Set ID and to modify the Entity Association PDRs. BMC also removes the corresponding state sensors/ effecters PDRs. However, BMC does not remove the FRU entry from the FRU Record Table.
When BMC adds a new FRU, BMC does send new FRU Record with the new FRU RSI and also adds the same to the FRU record table. Thus, there is a stale entry of the old FRU in the FRU record table and thus the size of the FRU record table increases.
This is not a big deal for a small number of FRU operations, and host behaves correctly. However, at scale it causes problems
This issue can be tested by doing a CM operation (removing a fru and adding it back) and check the FRU table contents before and after the CM. The CM operation can be performed by busctl commands.