open-power / hostboot

System initialization firmware for Power systems
Apache License 2.0
75 stars 97 forks source link

"NCU no response to snooped TLBIE" bugfix? #220

Closed JeremyRand closed 1 year ago

JeremyRand commented 1 year ago

According to this IBM document, PNOR firmware v2.18 contains the following bugfix:

HIPER/Pervasive: A problem was fixed for a processor core checkstop with SRC BC70E540 logged with Signature Description " ex(n0p1c4) (NCUFIR[11]) NCU no response to snooped TLBIE". This problem is intermittent and random but occurs with relatively high frequency for certain workloads. The trigger for the failure is one core of a fused core pair going into a stopped state while the other core of the pair continues running.

Was this a Hostboot bug, or was it a different firmware component? If it was in Hostboot, can you point me to the commit(s) that contain the fix?

dcrowell77 commented 1 year ago

The change was not part of Hostboot. It came in as part of the hcode repo. Specifically I believe it was this commit - https://github.com/open-power/hcode/commit/9eb379569ffc1ae192aaa82bba43b25a051633b4 .