tpircher-zz / pycrc

Free, easy to use Cyclic Redundancy Check (CRC) calculator and source code generator
https://pycrc.org
MIT License
169 stars 36 forks source link

Reflection of xor-in wrong for 32-bit CRC? #32

Open schlatterbeck opened 5 years ago

schlatterbeck commented 5 years ago

The "normal" crc32 (in python available in zlib or binascii) can do continuation of a crc, i.e. (python3):

>>> from zlib import crc32
>>> crc32(b'12345678')
2598427311
>>> crc32(b'12345678', 2598427311)
1808553911
>>> crc32(b'1234567812345678')
1808553911

pycrc doesn't seem to get this right (I've installed the pycrc.py as pycrc in my path, the prompt is designed to allow cutting/pasting the whole line):

: %; pycrc --version
pycrc v0.9.2

: %; pycrc --width=32 --poly=0x04c11db7 --xor-in=0 --xor-out=0 \
        --reflect-in=1 --reflect-out=1 --check-hexstring 12345678
0x6b4dd184
: %; pycrc --width=32 --poly=0x04c11db7 --xor-in=0x6b4dd184 --xor-out=0 \
        --reflect-in=1 --reflect-out=1 --check-hexstring 12345678
0x3ab64c61
: %; pycrc --width=32 --poly=0x04c11db7 --xor-in=0 --xor-out=0
        --reflect-in=1 --reflect-out=1 --check-hexstring 1234567812345678
0x1a8a2677

If I reverse the output value 0x6b4dd184 before using it in the second call above it works:

: %; ./rev 0x6b4dd184
0x218bb2d6
: %; pycrc --width=32 --poly=0x04c11db7 --xor-in=0x218bb2d6 --xor-out=0 \
        --reflect-in=1 --reflect-out=1 --check-hexstring 12345678
0x1a8a2677

So it looks like pycrc should reflect the xor-in value before xoring it into the register for reflected algos. Note that I'm not sure why pycrc gets it wrong for 32 bit but seems to get it right for crc16, there is the 16-bit model 'crc-16-ccitt' which has an asymmetric xor-in value of 0x1d0f. This produces a check-value consistent with other crc implementation of the same algo (e.g. reveng which names it CRC-16/SPI-FUJITSU). And continuation works, too (the algo has a zero xor-out, so we don't need to compensate):

: %; pycrc --width=16 --poly=0x1021 --xor-in=0x1d0f --xor-out=0 \
           --reflect-in=0 --reflect-out=0 --check-string=1234567812345678
0x5b57
: %; pycrc --width=16 --poly=0x1021 --xor-in=0x1d0f --xor-out=0 \
           --reflect-in=0 --reflect-out=0 --check-string=12345678
0x712c
: %; pycrc --width=16 --poly=0x1021 --xor-in=0x712c --xor-out=0 \
           --reflect-in=0 --reflect-out=0 --check-string=12345678
0x5b57

Any ideas?

schlatterbeck commented 5 years ago

Probably related to this: The behaviour seems to be different for check-hexstring and check-file. In the following example the file rand1 is a file generated from reading some bytes from /dev/urandom. The r1.d file is the corresponding hex-dump (with all whitespace removed, a single hex string). When specifying an asymmetric xor-in value we get different results:

: %; pycrc --width=32 --poly=0x04c11db7 --xor-in=0x12345678 --xor-out=0xffffffff \
    --reflect-in=1 --reflect-out=1 --check-file ../novo/rand1
0x25371555
: %; pycrc --width=32 --poly=0x04c11db7 --xor-in=0x12345678 --xor-out=0xffffffff \
    --reflect-in=1 --reflect-out=1 --check-hexstring $(cat ../novo/r1.d)
0x6a9d7bdb
schlatterbeck commented 5 years ago

I wrote earlier that it doesn't happen with 16-bit algos. This is not true, here is an example:

: %; pycrc --poly=0x8005 --xor-in=0x4711 --xor-out=0xFFFF --reflect-in=1 --reflect-out=1 --width=16 --check-file 1234567  
0x1499
: %; pycrc --poly=0x8005 --xor-in=0x4711 --xor-out=0xFFFF --reflect-in=1 --reflect-out=1 --width=16 --check-string 1234567
0xe44b

The file 1234567 contains the content (produced with echo -n 1234567 > 1234567)

The difference to the earlier 16-bit example seems to have been the reflect-in=1 vs. 0.

schlatterbeck commented 5 years ago

For fix see pull-request #33

tpircher-zz commented 5 years ago

[Edit the message as I wrote some complete nonsense before...]

With the parameters of the crc-32 model, the CRC calculation goes like this:

  1. Set the register with the initial value
  2. Reflect the register
  3. run the CRC update function
  4. update the register by running Width zeroes through it
  5. reflect the register
  6. apply the XOR out value

The CRC update function can be chained, but the first two steps and the last three steps must be run only once. I haven't looked at the Python crc32 implementation, but I guess they must do some compensation to counteract the finalisation step from the previous loop.

Does this make sense?

schlatterbeck commented 5 years ago

First: Thanks for taking the time to look into this. Have you noticed that the file-based CRC and the check-string/check-hexstring do different things for the same input? So there is clearly a bug and I think the problem is in the check-string.

In particular:

Example, see the second set of calls where I compare the behaviour of check-file and check-string:

: %; echo -n 12345678 > 12345678
: %; cat 12345678 12345678 > 1234567812345678
: %; pycrc --model crc-32 --check-file=12345678            
0x9ae0daaf
: %; pycrc --model crc-32 --check-string=12345678
0x9ae0daaf
: %; tools/xor.py 0x9ae0daaf 0xffffffff
651F2550
: %; pycrc --model crc-32 --xor-in=0x651F2550 --check-file=12345678
0x6bcc57b7
: %; pycrc --model crc-32 --xor-in=0x651F2550 --check-string=12345678
0xf3b32bd5
: %; pycrc --model crc-32 --check-file=1234567812345678          
0x6bcc57b7
: %; pycrc --model crc-32 --check-string=1234567812345678 
0x6bcc57b7
schlatterbeck commented 5 years ago

Concerning your other message: The python/zlib crc32 implementation takes two parameters, the string to check and an optional crc-in parameter. The xor-in is applied to the crc-in before using the result as the initial value of the crc. That's why it chains.

From your 1. -- 6.: It seems you're missing step 2. (Reflect the register) for the check-string and check-hexstring cases, see example in previous comment. The reflection should only happen if reflect-out is True. Not if reflect-in is True. The reflect-in deals solely with the byte-reflection when reading input.

I'll try to dig up some other general crc implementations and back up my claims with examples.

Also note: I wasn't able to reproduce your failing tests. For me with or without patch I'm getting:

: %; python test/test.py
Test OK
schlatterbeck commented 5 years ago

Ok, I've discovered the -a option to test.py (from your comment on my other issue :-) There is just one test failing (or does it stop on the first failed test?). This is due to the reflect-in vs. reflect-out correction I made: Reflection of the xor-in should happen if reflect-out is set not reflect-in. I'll check this against another crc-implementation and will update my pull-request accordingly.

schlatterbeck commented 5 years ago

OK, tested this:

This all means that for an algorithm with reflect-out=True when we want to feed multiple values and have the same effect as a single feed of the whole string we must do the following with the result from the previous feed:

Note that the following example works with my patch applied, otherwise the check-file version in the second pair of pycrc calls will fail. So for crc-32 we do (my previous example adapted for an additional reflect step):

: %; echo -n 12345678 > 12345678
: %; cat 12345678 12345678 > 1234567812345678
: %; pycrc --model crc-32 --check-file=12345678            
0x9ae0daaf
: %; pycrc --model crc-32 --check-string=12345678
0x9ae0daaf
: %; tools/xor.py 0x9ae0daaf 0xffffffff
651F2550
: %; tools/reflect.py 651F2550
0xaa4f8a6
: %; pycrc --model crc-32 --xor-in=0xaa4f8a6 --check-file=12345678
0x6bcc57b7
: %; pycrc --model crc-32 --xor-in=0xaa4f8a6 --check-string=12345678
0x6bcc57b7
: %; pycrc --model crc-32 --check-file=1234567812345678          
0x6bcc57b7
: %; pycrc --model crc-32 --check-string=1234567812345678 
0x6bcc57b7