open-power / hostboot

System initialization firmware for Power systems
Apache License 2.0
75 stars 97 forks source link

Will Mcbist check all addresses of all DIMMs? #246

Closed Grubby0624 closed 4 months ago

Grubby0624 commented 4 months ago

https://github.com/open-power/hostboot/blob/c4227d1e3dcf86b00ce0275cfebe2148d2035a24/src/import/generic/memory/lib/utils/mcbist/gen_mss_memdiags.H#L678 Will Mcbist check overwrite all memory cells? Also, is the error detected by Mcbist during the IPL process handled by the prdf module? Thanks!

sglancy6 commented 4 months ago

memdiags will initialize all addresses with good ECC and will verify all addresses have good ECC.

sglancy6 commented 4 months ago

Yes - PRD will log ECC errors found during memdiags. However logging of CEs in memdiags requires enabling mnfg policy flag 'memdiags CE screening'.

Grubby0624 commented 4 months ago

So, if memodiag detects the same type of error(Maintenance IMPE for example) on different memory cells, will it record all incorrect addresses? Will each address also be processed once during PRDF processing? What I want to do now is to record all memory errors on the system, whether they are recoverable or unrecoverable.

sglancy6 commented 4 months ago

From our RAS engineer: memdiags, run during the IPL, records errors on a srank granularity only, no single address granularity (this was partly a trade-off to allow memdiags to run more efficiently, since it would otherwise have to stop/start on every error in order to know what the address was). Runtime errors are recorded for individual addresses.

Grubby0624 commented 4 months ago

Okay,got it, thank you very much!