libyal / libevtx

Library and tools to access the Windows XML Event Log (EVTX) format
GNU Lesser General Public License v3.0
188 stars 49 forks source link

Should the checksum of chunk be 64-bit? #27

Closed yhojann-cl closed 3 years ago

yhojann-cl commented 3 years ago

In the file documentation/Windows XML Event Log (EVTX).asciidoc in Chunk header section says:

| Offset | Size | Value | Description
...
|124 | 4 |   | Checksum CRC32 of the first 120 bytes and bytes 128 to 512 of the chunk.

But the value is a Int64, need 8 bytes: from 124 to 132, by example: '\x8c\x95\xaf\xac\x00\x00\x00\x00'` is 2897188236, four bytes is int32 only:

>>> struct.unpack('<i', b'x8c\x95\xaf\xac')
(-1397779060,)
>>> struct.unpack('<q', b'\x8c\x95\xaf\xac\x00\x00\x00\x00')
(2897188236,)

In my file the CRC32 of bytes 0 to 120 and 128 to 512 is 2897188236, the checksum is correct when use int64.

The same problem happens with Event records checksum, is a int64, need 8 bytes, from 52 to 60. In my file the value is 8"8\xd7\x00\x00\x00\x00: 3610780216, same value using a int32 with 4 bytes is -684187080.

joachimmetz commented 3 years ago

But the value is a Int64, need 8 bytes

Why? shouldn't an unsigned int32 suffice to store a CRC32? How have you determined this is an int64 ? or is this just because it fits your struct.unpack ?

joachimmetz commented 3 years ago

The need for a 32-bit value to be stored as 64-bit sound very implausible to me.

Did you try capital i I as the struct format character? https://docs.python.org/3/library/struct.html#format-characters

yhojann-cl commented 3 years ago

Because the number of bytes required to binarily correct unpack is 8 bytes, not 4, a 4-byte value is an integer, not a long (int64).

yhojann-cl commented 3 years ago

An example to this implementation: https://github.com/williballenthin/python-evtx/blob/master/Evtx/BinaryParser.py But i made my own version for export evtx files to sqlite3.

In my code write:

        self.chunkOpts['header'] = {
            'signature'                     : chunkBytes[0:8],                             # binary      : b'ElfChnk\x00'
            'first-event-record-number'     : struct.unpack('<q', chunkBytes[8  :16 ])[0], # long(int64) : %d
            'last-event-record-number'      : struct.unpack('<q', chunkBytes[16 :24 ])[0], # long(int64) : %d
            'first-event-record-identifier' : chunkBytes[24:32],                           # ???         : %s
            'last-event-record-identifier'  : chunkBytes[32:40],                           # ???         : %s
            'header-size'                   : struct.unpack('<i', chunkBytes[40 :44 ])[0], # int(int32)  : 128
            'last-event-record-data-offset' : struct.unpack('<i', chunkBytes[44 :48 ])[0], # int(int32)  : %s
            'event-records-checksum'        : struct.unpack('<q', chunkBytes[52 :60 ])[0], # int(int32)  : %s
            'checksum'                      : struct.unpack('<q', chunkBytes[124:132])[0], # long(int64) : %s
            'common-string-offset-array'    : chunkBytes[128:384],                         # ???         : %s
            'template-ptr'                  : chunkBytes[384:512]                          # ???         : %s
        }
joachimmetz commented 3 years ago

Because the number of bytes required to binarily correct unpack is 8 bytes, not 4, a 4-byte value is an integer, not a long (int64).

I don't understand you I'm able to do this perfectly fine with just the 32-bit.

yhojann-cl commented 3 years ago

You can unpack it as 32 but really the value is 64 and it needs the remaining 4 bytes, if you omit the 4 bytes nothing happens, you can process it but in larger files you will have problems.

joachimmetz commented 3 years ago

This sounds more like you're using the wrong struct format identifier maybe you need to use L instead of I. In C code 32-bit works just fine, which makes perfect sense for a CRC-32

yhojann-cl commented 3 years ago

You are right, I will find out more on this topic.

joachimmetz commented 3 years ago

Ack, I'll close this issue then.

yhojann-cl commented 3 years ago

I have been finding out and you are right, it is an unsigned integer and that requires the remaining bytes, in my case I had to add them thinking that they were signed. Sorry for the time.