liyi-ibm / linux

Linux kernel source tree
Other
0 stars 1 forks source link

opal-prd 100% cpu usage #7

Open liyi-ibm opened 6 years ago

liyi-ibm commented 6 years ago

we observed on P8 opal-prd v5.1 sometimes take 100% cpu usage. One possible cause is “There was a dependency ordering that wasn’t being satisfied for opal-prd, specifically the PNOR driver was not loaded before opal-prd started. This caused the HBRT code to get stuck and then 100% cpu usage.” FW team said there is an OS-level fix. Could anyone help on the details of the OS fix?

The HBRT fix for the possible cause above: “The HBRT commit is 1e784c03824d66dd76ee5effe16b55782c703599 in master.

   Handle early life PNOR fails in HBRT instead of hanging

   A hang happens when RtPNOR code creates an error log while it still hasn’t initialized completely. Error log code calls PNOR code that hasn’t completed initialization yet. The fix is to assert in HBRT and by the time HBRT gets restarted, PNOR should be present and accessible.
“.

But the 100% cpu usage is still observed. We confirmed the Hostboot we used includes this fix.

Opal-prd fix:

Vasant Hegde
can you add below commit and rebuld opal-prd ?
commit cb16e55a234b91fd42112904cff15094fbae680d
Author: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Date:   Tue Apr 3 23:08:41 2018 +0530

   opal-prd: Insert powernv_flash module
AFAIK this is the only fix went into skiboot/opal to sort out 100% CPU utlization issue

This fix is already in opal-prd v6.0, v6.0.4.

Also Vasant suggests:

or try below steps:
stop opal-prd daemon
 make sure prd and mtd kernel module is loaded
 start opal-prd daemon 
-- if you still hit 100% CPU utilization then it needs to be debugged

The 100% cpu usage appears sometimes, but it does not show up anymore. So keep this bug open.