mchehab / rasdaemon

Rasdaemon is a RAS (Reliability, Availability and Serviceability) logging tool. It records memory errors, using the EDAC tracing events. EDAC is a Linux kernel subsystem with handles detection of ECC errors from memory controllers for most chipsets on i386 and x86_64 architectures. EDAC drivers for other architectures like arm also exists.
GNU General Public License v2.0
188 stars 81 forks source link

rasdaemon: labels/intel add vendor and DQ57TM model #142

Closed walterav1984 closed 5 months ago

walterav1984 commented 9 months ago

For the Intel Corporation DQ57TM motherboard, booted in UEFI showed the right locations with --guess-labels but still need the following labels/intel addition for showing correct DIMM numbers.

For slot2 and channel2 shown by layout no resembling DIMM locations are situated on the motherboard.

$ sudo dmesg | grep DMI | grep DQ57TM
[    0.000000] DMI:  /DQ57TM, BIOS TMIBX10H.86A.0050.2011.1207.1134 12/07/2011

$ cat /proc/cpuinfo | grep Xeon | head -n1
model name  : Intel(R) Xeon(R) CPU           X3470  @ 2.93GHz

$ lsmod | grep edac
i7core_edac            40960  0

$ sudo ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Intel Corporation model DQ57TM

$ sudo ras-mc-ctl --guess-labels
memory stick 'DIMM 3' is located at 'CHANNEL A DIMM 0'
memory stick 'DIMM 1' is located at 'CHANNEL A DIMM 1'
memory stick 'DIMM 4' is located at 'CHANNEL B DIMM 0'
memory stick 'DIMM 2' is located at 'CHANNEL B DIMM 1'

$ sudo ras-mc-ctl --error-count
Label                   CE  UE
CPU#0Channel#0_DIMM#0   0   0
CPU#0Channel#1_DIMM#0   0   0

$ sudo ras-mc-ctl --layout
       +-----------------------------------+
       |                mc0                |
       | channel0  | channel1  | channel2  |
-------+-----------------------------------+
slot2: |     0 MB  |     0 MB  |     0 MB  |
slot1: |     0 MB  |     0 MB  |     0 MB  |
slot0: |  4096 MB  |  4096 MB  |     0 MB  |
-------+-----------------------------------+

#layout edited with dimm labels
       +-----------------------------------+
       |                mc0                |
       | channel0  | channel1  | channel2  |
-------+-----------------------------------+
slot2: |     0 MB  |     0 MB  |     0 MB  |
slot1: |  CHA_D_1  |  CHB_D_1  |     0 MB  |
slot0: |  CHA_D_0  |  CHB_D_0  |     0 MB  |
-------+-----------------------------------+

$ sudo ras-mc-ctl --print-labels #edited labels but not registered yet
Use of uninitialized value in lc at /usr/sbin/ras-mc-ctl line 741.
LOCATION                            CONFIGURED LABEL     SYSFS CONTENTS      
mc0 channel 0 slot 0                CHANNEL_A_DIMM_0     CPU#0Channel#0_DIMM#0
                                    CHANNEL_A_DIMM_1     0:0:1 missing       
mc0 channel 1 slot 0                CHANNEL_B_DIMM_0     CPU#0Channel#1_DIMM#0
                                    CHANNEL_B_DIMM_1     0:1:1 missing

$ sudo ras-mc-ctl --register-labels
$ sudo ras-mc-ctl --print-labels
LOCATION                            CONFIGURED LABEL     SYSFS CONTENTS      
mc0 channel 0 slot 0                CHANNEL_A_DIMM_0     CHANNEL_A_DIMM_0    
                                    CHANNEL_A_DIMM_1     0:0:1 missing       
mc0 channel 1 slot 0                CHANNEL_B_DIMM_0     CHANNEL_B_DIMM_0    
                                    CHANNEL_B_DIMM_1     0:1:1 missing

The DIMM labels and slot coloring on the motherboard from left to right is CHANNEL A DIMM 1(black), CHANNEL A DIMM 0(blue), CHANNEL B DIMM 1(black), CHANNEL B DIMM 0(blue) may sound confusing but that's what it is.

Signed-off-by: Walter Sonius walterav1984@gmail.com

mchehab commented 5 months ago

Merged, thanks!