ralish / DecodeWheaRecord

Decode hex-encoded Windows Hardware Event Architecture (WHEA) records
MIT License
14 stars 2 forks source link

Help understand output of decoded WHEA error record #4

Closed ChromiaCat closed 10 months ago

ChromiaCat commented 10 months ago

output.txt I got this as the output, now I don't know what to do.

Edit: Now in text form so you don't have to download the file.

C:\Users\somethingsomething\Documents\GitHubRepos\DecodeWheaRecord\bin\DecodeWheaRecord\Release>DecodeWheaRecord.exeeader indicates error record contains 298 bytes but marshalled 200 bytes. { "Header": { "Signature": "CPER", "Revision": { "MinorRevision": 16, "MajorRevision": 2 }, "SignatureEnd": 4294967295, "SectionCount": 1, "Severity": "Fatal", "ValidBits": "PlatformId, Timestamp, PartitionId", "Length": 298, "Timestamp": { "Seconds": 17, "Minutes": 8, "Hours": 21, "Flags": "", "Day": 25, "Month": 1, "Year": 24, "Century": 20 }, "PlatformId": "83c1603c-1552-48a7-87d1-14d9467d7765", "PartitionId": "00000000-0000-0000-0000-000000000000", "CreatorId": "Device Driver", "NotifyType": "DEVICE_DRIVER_NOTIFY_TYPE_GUID", "RecordId": 133506556525740266, "Flags": "DeviceDriver", "Reserved": "AAAAAAAAAAA=" }, "SectionDescriptor": [ { "SectionOffset": 200, "SectionLength": 98, "Revision": { "MinorRevision": 0, "MajorRevision": 3 }, "ValidBits": "FRUText", "Reserved": 0, "Flags": "Primary", "SectionType": "00000000-0000-0000-0000-000000000000", "SectionSeverity": "Fatal", "FRUText": "" } ], "Section": [] }

ralish commented 10 months ago

This is one of those class of errors that's frustrating to deal with. You can see from the decoded header that this is a WHEA error reported by a device driver (CreatorId and NotifyType fields), but that's about it. The really interesting data would normally be present in the Section field. The source data is there in the hex-encoded event, and the program warns you there's an extra 98 bytes of data it didn't decode:

[WHEA_ERROR_RECORD] Header indicates error record contains 298 bytes but marshalled 200 bytes.

The reason it's not decoded is the structure of that data is completely up to the reporting driver. There's no standard structure, so it's impossible to decode without knowing the details of the structure used by the reporting driver. Worse, they're effectively all undocumented. We can glean a little information from the strings present in those 98 bytes:

          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000000 53 54 4F 52 50 4F 52 54 01 00 62 00 00 00 03 00 STORPORT� b   �
0000000000000010 01 00 05 00 11 00 00 00 46 2E 28 1F 3F 90 EE 11 � � �   F.(�?�î�
0000000000000020 B5 77 80 6E 6F 6E 69 63 73 00 74 00 6F 00 72 00 µw�nonics t o r
0000000000000030 6E 00 76 00 6D 00 65 00 00 00 00 00 00 00 00 00 n v m e
0000000000000040 00 00 00 00 00 00 00 00 4E 56 4D 65 20 20 20 20         NVMe
0000000000000050 00 4B 49 4E 47 53 54 4F 4E 20 53 46 59 52 44 32  KINGSTON SFYRD2
0000000000000060 30 00                                           0

So this looks to have been reported by the storport driver. The underlying hardware interface is presumably NVMe (stornvme) and the device the event relates to is a "Kingston SFYRD20" (NVMe KINGSTON SFYRD20). Understanding what all the other data means would require reverse engineering the relevant parts of at least the storport driver. Obviously, it'd be very helpful if Microsoft just published the structures in the Windows SDK, but they don't.

ralish commented 10 months ago

Also, I accidentally hit enter half-way through the above comment, so if you got an email notification it's probably only got the mistakenly submitted comment. I'm closing this issue for now but feel free to reply if the above is unclear.

ChromiaCat commented 10 months ago

This is one of those class of errors that's frustrating to deal with. You can see from the decoded header that this is a WHEA error reported by a device driver (CreatorId and NotifyType fields), but that's about it. The really interesting data would normally be present in the Section field. The source data is there in the hex-encoded event, and the program warns you there's an extra 98 bytes of data it didn't decode:

[WHEA_ERROR_RECORD] Header indicates error record contains 298 bytes but marshalled 200 bytes.

The reason it's not decoded is the structure of that data is completely up to the reporting driver. There's no standard structure, so it's impossible to decode without knowing the details of the structure used by the reporting driver. Worse, they're effectively all undocumented. We can glean a little information from the strings present in those 98 bytes:

          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000000 53 54 4F 52 50 4F 52 54 01 00 62 00 00 00 03 00 STORPORT� b   �
0000000000000010 01 00 05 00 11 00 00 00 46 2E 28 1F 3F 90 EE 11 � � �   F.(�?�î�
0000000000000020 B5 77 80 6E 6F 6E 69 63 73 00 74 00 6F 00 72 00 µw�nonics t o r
0000000000000030 6E 00 76 00 6D 00 65 00 00 00 00 00 00 00 00 00 n v m e
0000000000000040 00 00 00 00 00 00 00 00 4E 56 4D 65 20 20 20 20         NVMe
0000000000000050 00 4B 49 4E 47 53 54 4F 4E 20 53 46 59 52 44 32  KINGSTON SFYRD2
0000000000000060 30 00                                           0

So this looks to have been reported by the storport driver. The underlying hardware interface is presumably NVMe (stornvme) and the device the event relates to is a "Kingston SFYRD20" (NVMe KINGSTON SFYRD20). Understanding what all the other data means would require reverse engineering the relevant parts of at least the storport driver. Obviously, it'd be very helpful if Microsoft just published the structures in the Windows SDK, but they don't.

Thanks a bunch for the explanation, you rock! Yes that is the main NVMe drive for the OS. I hope I can disregard this crash as a random bug in the driver and not that the device is failing, it booted fine afterwards. At least CrystalDiskInfo reports 100% health still and I don't notice any corruption as of writing this.