Closed dearblue closed 1 year ago
I actually tested it with the tool provided by Prof. Koopman.
On FreeBSD 13.2 amd64, building with gcc causes a crash with SIGSEGV
, so I built with clang.
However, a patch was needed.
Note that the representation of the generator polynomial passed to the hdlen
command requires that the upper 32 bits be taken out of the complete 33 bits.
% ./hdlen $(printf '0x%08x\n' $((0x1f26b8303 >> 1)) ) ## CRC-32 (bzip3)
Poly=0xf935c181 startHD=3 maxHD=17
# 0xf935c181 HD=3 len=9932275 Example: Len=9932276 {0} (0x80000000) (Bits=2)
# 0xf935c181 HD=4 len=9932275 Example: Len=9932276 {0} (0x80000000) (Bits=2)
# 0xf935c181 HD=5 len=3684 Example: Len=3685 {0,1063,1932} (0x80000000) (Bits=4)
# 0xf935c181 HD=6 len=3684 Example: Len=3685 {0,1063,1932} (0x80000000) (Bits=4)
# 0xf935c181 HD=7 len=178 Example: Len=179 {0,50,83,88,105} (0x80000000) (Bits=6)
# 0xf935c181 HD=8 len=178 Example: Len=179 {0,50,83,88,105} (0x80000000) (Bits=6)
# 0xf935c181 HD=9 len=42 Example: Len=43 {0,1,8,15,16} (0x80020800) (Bits=8)
# 0xf935c181 HD=10 len=42 Example: Len=43 {0,1,8,15,16} (0x80020800) (Bits=8)
# 0xf935c181 HD=11 len=17 Example: Len=18 {0,9} (0x90174008) (Bits=10)
# 0xf935c181 HD=12 len=17 Example: Len=18 {0,9} (0x90174008) (Bits=10)
# 0xf935c181 HD=13 len=8 Example: Len=9 {0,4,7,8} (0x80890609) (Bits=12)
# 0xf935c181 HD=14 len=8 Example: Len=9 {0,4,7,8} (0x80890609) (Bits=12)
# 0xf935c181 HD=15 len=1 Example: Len=2 {0} (0x85af2141) (Bits=14)
# 0xf935c181 HD=16 len=1 Example: Len=2 {0} (0x85af2141) (Bits=14)
# 0xf935c181 HD=17 NONE Example: Len=1 {0} (0xf935c181) (Bits=16)
0xf935c181 {9932275,9932275,3684,3684,178,178,42,42,17,17,8,8,1,1}
% ./hdlen $(printf '0x%08x\n' $((0x104c11db7 >> 1)) ) ## CRC-32
Poly=0x82608edb startHD=3 maxHD=16
...
0x82608edb {4294967263,91607,2974,268,171,91,57,34,21,12,10,10,10}
% ./hdlen $(printf '0x%08x\n' $((0x11edc6f41 >> 1)) ) ## CRC-32C
Poly=0x8f6e37a0 startHD=3 maxHD=19
...
0x8f6e37a0 {2147483615,2147483615,5243,5243,177,177,47,47,20,20,8,8,6,6,1,1}
:scream_cat: I'm no specialist, so I couldn't read the results. :see_no_evil:
Ah, I checked the commit log and it appears that the problem is now caused by https://github.com/kspalaiologos/bzip3/commit/7a8d35b161403d2c873abb6cbef74deaa18354cd. The issue leads to #105, so I'm closing this.
I would like to add a slight addition that I noticed later. :smile_cat: Not a bad story by any means.
The CRC used in bzip3 is not strictly speaking a CRC-32C. I'm using the same usage as the first commit 3bf96e23c7827fdb5286407dfcd29181ce40f50e at the moment and that's my conclusion drawn from it.
% git grep -wn crc32sum 3bf96e23c7827fdb5286407dfcd29181ce40f50e
3bf96e23c7827fdb5286407dfcd29181ce40f50e:crc32.h:60:static uint32_t crc32sum(uint32_t crc, uint8_t *buf, size_t size) {
3bf96e23c7827fdb5286407dfcd29181ce40f50e:main.c:83: uint32_t crc32 = crc32sum(1, buffer, bytes_read);
3bf96e23c7827fdb5286407dfcd29181ce40f50e:main.c:151: if (crc32sum(1, output, bytes_read) != crc32) {
However, since the generating polynomial is the same as that of CRC-32C, I believe the detection performance is equivalent. As a amateur's conclusion, there is nothing wrong with the current version.
A comparison with CRC-32C is shown below.
Property | CRC-32C | CRC-32/BZIP3 (temporary name just when here) | Note |
---|---|---|---|
Bit width | 32 | 32 | |
Generator polynomial | 0x1edc6f41 | 0x1edc6f41 | |
Initial value | 0 | 1 | |
Initial internal state | 0xffffffff | 1 | (Initial value) XOR (XOR output) |
Reflect input | true | true | |
Reflect output | true | true | |
XOR output | 0xffffffff | 0 | |
Result of "123456789" | 0xe3069283 | 0xacdd2c68 | |
Residue | 0xb798b438 | 0 | |
Magic number of CRC | 0x48674bc7 | 0 | (Residue) XOR (XOR output) |
To repeat the amateur's conclusion, there is nothing wrong with the current version.
In case anyone is concerned, I leave the following example patch for changing to CRC-32C. I probably won't send a PR, however.
diff --git a/src/libbz3.c b/src/libbz3.c
index 15c327a..d8a521c 100644
--- a/src/libbz3.c
+++ b/src/libbz3.c
@@ -61,8 +61,9 @@ static const u32 crc32Table[256] = {
};
static u32 crc32sum(u32 crc, u8 * RESTRICT buf, size_t size) {
+ crc ^= 0xffffffff;
while (size--) crc = crc32Table[((u8)crc ^ *(buf++)) & 0xff] ^ (crc >> 8);
- return crc;
+ return crc ^ 0xffffffff;
}
/* LZP code. These constants were manually tuned to give the best compression ratio while using relatively
@@ -553,7 +554,7 @@ BZIP3_API s32 bz3_encode_block(struct bz3_state * state, u8 * buffer, s32 data_s
return -1;
}
- u32 crc32 = crc32sum(1, b1, data_size);
+ u32 crc32 = crc32sum(0, b1, data_size);
// Ignore small blocks. They won't benefit from the entropy coding step.
if (data_size < 64) {
@@ -634,7 +635,7 @@ BZIP3_API s32 bz3_decode_block(struct bz3_state * state, u8 * buffer, s32 data_s
memmove(buffer, buffer + 8, data_size - 8);
- if (crc32sum(1, buffer, data_size - 8) != crc32) {
+ if (crc32sum(0, buffer, data_size - 8) != crc32) {
state->last_error = BZ3_ERR_CRC;
return -1;
}
@@ -726,7 +727,7 @@ BZIP3_API s32 bz3_decode_block(struct bz3_state * state, u8 * buffer, s32 data_s
if (b1 != buffer) memcpy(buffer, b1, size_src);
- if (crc32 != crc32sum(1, buffer, size_src)) {
+ if (crc32 != crc32sum(0, buffer, size_src)) {
state->last_error = BZ3_ERR_CRC;
return -1;
}
bzip3 uses a 32-bit CRC, which is different from the CRC-32C generator polynomial
0x1edc6f41
.As far as I can see, the 32-bit CRC generator polynomial used by bzip3 is
0xf26b8303
.The bzip3 CRC calculation is normal in the input bit direction. It can be read from the fact that the input is to the upper 8 bits of the CRC and the lower 24 bits are left shifted.
https://github.com/kspalaiologos/bzip3/blob/646212a8f5cb22e322e155bec3da51685bd03bfc/src/libbz3.c#L64
If the input bit direction is in reverse order, input to the lower 8 bits of the CRC and shift the upper 24 bits to the right.
https://github.com/madler/zlib/blob/04f42ceca40f73e2978b50e93806c2a18c1281fc/crc32.c#L1062
When checking the generator polynomial from a table with 256 elements, if the input bit direction is normal, the element of
table[1]
is the generator polynomial as it is.https://github.com/kspalaiologos/bzip3/blob/646212a8f5cb22e322e155bec3da51685bd03bfc/src/libbz3.c#L32
The generator polynomial of bzip3 is
0xf26b8303
.For reference, to check the generator polynomial from a table whose input bit direction is in reverse order, reverse the bit direction of the value of
table[128]
. Reversing the bit direction of0x82f63b78
, the current bzip3table[128]
, gives0x1edc6f41
.The detectability and characteristics of CRCs vary depending on the generator polynomial. From a survey by Prof. Koopman: https://users.ece.cmu.edu/~koopman/crc/crc32.html
I don't know how detectable the generator polynomial
0xf26b8303
of the CRC used by bzip3 is.In my opinion, it would be better to use the tables as they are and the calculations in the reverse input bit direction. However, for compatibility reasons, the file format signature may also need to be
BZ3v2
?For information, the bzip2 CRC is normal in the input direction, and the first element of the table is
0x04c11db7
.