we observed on P8 opal-prd v5.1 sometimes take 100% cpu usage. One possible cause is “There was a dependency ordering that wasn’t being satisfied for opal-prd, specifically the PNOR driver was not loaded before opal-prd started. This caused the HBRT code to get stuck and then 100% cpu usage.” FW team said there is an OS-level fix. Could anyone help on the details of the OS fix?
The HBRT fix for the possible cause above: “The HBRT commit is 1e784c03824d66dd76ee5effe16b55782c703599 in master.
Handle early life PNOR fails in HBRT instead of hanging
A hang happens when RtPNOR code creates an error log while it still hasn’t initialized completely. Error log code calls PNOR code that hasn’t completed initialization yet. The fix is to assert in HBRT and by the time HBRT gets restarted, PNOR should be present and accessible.
“.
But the 100% cpu usage is still observed. We confirmed the Hostboot we used includes this fix.
Opal-prd fix:
Vasant Hegde
can you add below commit and rebuld opal-prd ?
commit cb16e55a234b91fd42112904cff15094fbae680d
Author: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Date: Tue Apr 3 23:08:41 2018 +0530
opal-prd: Insert powernv_flash module
AFAIK this is the only fix went into skiboot/opal to sort out 100% CPU utlization issue
This fix is already in opal-prd v6.0, v6.0.4.
Also Vasant suggests:
or try below steps:
stop opal-prd daemon
make sure prd and mtd kernel module is loaded
start opal-prd daemon
-- if you still hit 100% CPU utilization then it needs to be debugged
The 100% cpu usage appears sometimes, but it does not show up anymore. So keep this bug open.
we observed on P8 opal-prd v5.1 sometimes take 100% cpu usage. One possible cause is “There was a dependency ordering that wasn’t being satisfied for opal-prd, specifically the PNOR driver was not loaded before opal-prd started. This caused the HBRT code to get stuck and then 100% cpu usage.” FW team said there is an OS-level fix. Could anyone help on the details of the OS fix?
But the 100% cpu usage is still observed. We confirmed the Hostboot we used includes this fix.
Opal-prd fix:
This fix is already in opal-prd v6.0, v6.0.4.
Also Vasant suggests:
The 100% cpu usage appears sometimes, but it does not show up anymore. So keep this bug open.