mchehab / rasdaemon

Rasdaemon is a RAS (Reliability, Availability and Serviceability) logging tool. It records memory errors, using the EDAC tracing events. EDAC is a Linux kernel subsystem with handles detection of ECC errors from memory controllers for most chipsets on i386 and x86_64 architectures. EDAC drivers for other architectures like arm also exists.
GNU General Public License v2.0
177 stars 79 forks source link

How/why is MAJ:MIN calculated in its present state via ras-mc-ctl --summary #71

Open weavingneedle opened 2 years ago

weavingneedle commented 2 years ago

The MAJ:MIN numbers are very different from MAJ:MIN in lsblk and /sys/dev/ and I don't see anywhere in the documentation explaining how they are calculated. The calculation from lsblk to ras-mc-ctl in this answer (https://unix.stackexchange.com/questions/602411/interpret-disk-errors-output-from-ras-mc-ctl-summary) sort of works, but there are some MAJ:MIN that lsblk lists that aren't in ras-mc-ctl output and vice versa when trying to do modulus operation to convert ras-mc-ctl to lsblk. This makes it hard to determine which of the drives/partitions/etc... belong to which. Using numbers from block or char in /sys/dev would be expected, but I am not experienced in this area and could be wrong.

I experienced this with rasdaemon-0.6.7-2.fc35.x86_64.

bluikko commented 1 year ago

Same on rasdaemon-0.6.7-8.el9.x86_64. Looking at the source code it seems the device ID decoding happens in ras-diskerror-handler.c and is using the standard major(), minor() macros.

ras-mc-ctl just displays the exact device string as read from the database.

major() and minor() would probably not give wrong results if the source data was correct which comes from pevent_get_field_val(). Unfortunately that's all the time I could use for this, will just decode the fields manually.