kubo / snzip

Snzip, a compression/decompression tool based on snappy
Other
217 stars 30 forks source link

Invalid CRC for framing modes #25

Open monoxgas opened 4 years ago

monoxgas commented 4 years ago

https://github.com/kubo/snzip/blob/809c6f2db74bdd2ccfc34f4dc30adf0338161af3/crc32.h#L15

It would appear that while calculating the masked CRC in the function above, you perform a bitwise NOT operator on the result before masking. I'm unsure what formats require this, if any (possibly snappy-in-java), but this creates a mismatched CRC for most other public implementations of framing2. Specifically:

I was hesitant to make a PR until I verified which formats might require it, but it should be an easy fix (I'd be happy to do it). Possibly two different functions, one which inverts and one which does not.

monoxgas commented 4 years ago

Also, for quick reference, the CRC-32 section of the current framing2 spec: https://github.com/google/snappy/blob/e9e11b84e629c3e06fbaa4f0a86de02ceb9d6992/framing_format.txt#L39

monoxgas commented 4 years ago

The plot thickens.

Upon further inspection, I believe the implementation here is indeed correct and it's the Mozilla repo which is at fault. I also erroneously claimed that python-snappy disagrees with the CRC checksums currently being computed, however they do indeed match.

Specifically, the formal spec of CRC-32/ISCSI (CRC-32C) states that the initialization value should be 0xffffffff/~0 and that the final value should be complemented : x ^ 0xffffffff or x = ~x. This is exactly what is happening here:

https://github.com/kubo/snzip/blob/809c6f2db74bdd2ccfc34f4dc30adf0338161af3/crc32.h#L15

References:

However this still creates a small pickle, as there is now technically another "flavour" for this single compression format. Given that the difference is so small, I would hesitate to create an entirely new format such as mozilla-framing, but whatever fits the project best. An alternative solution might be to compare both the correct checksum and it's complement. It's technically inaccurate, but might save some headache and should never cause any real issues.

kubo commented 4 years ago

I made a small program to compare the results of the implementation with CRC examples.

$ git clone https://github.com/kubo/snzip.git
$ cd snzip
$ ./autogen.sh
$ ./configure
$ wget https://gist.githubusercontent.com/kubo/c6bc9f038e3dd66bfb37da4f8596d119/raw/b97e2a7d325fc15edbc149a036f0605f79a2e22d/crc32_test.c
  ...omitted...
$ gcc -o crc32_test crc32_test.c crc32.c crc32_sse4_2.c -msse4.2
$ ./crc32_test 
32 bytes of zeroes: OK
32 bytes of ones: OK
32 bytes of incrementing 00..1f: OK
32 bytes of decrementing 1f..00: OK
An iSCSI - SCSI Read (10) Command PDU: OK

The results are same with the CRC examples.

As for your two ideas, I prefer the former. However I postpone my decision for a while.