google / zopfli

Zopfli Compression Algorithm is a compression library programmed in C to perform very good, but slow, deflate or zlib compression.
Apache License 2.0
3.43k stars 330 forks source link

mingw64 GCC 5.3 and linux GCC 4.8 - different results #95

Closed MrKrzYch00 closed 8 years ago

MrKrzYch00 commented 8 years ago

I found an issue with Zopfli (the same behavior happens in my fork) if anybody is interested.

When I compile it using mingw64 GCC 5.3 in MSYS2 (doesn't matter if it's x86 or x64 version, also O0 doesn't make any difference as well) the Windows' binary produces different block slit points and also the progress of iterations progresses differently (when I pass exact same block split points in my fork) than it does with Linux' version compiled with GCC 4.8.x.

Not sure if it's some kind of bug in GCC 4.8.x or regression in GCC 5.3. Also I'm not sure which one provides VALID compilation. Though I know for sure that with older toolchains I used prior to MSYS2 there were no differences.

Example of GCC 5.3 split points produced: 2882 17106 18913 (hex: b42 42d2 49e1) Example of GCC 4.8 split points produced: 2853 17304 18923 (hex: b25 4398 49eb)

So far I can't figure if there is anything in the code that can be altered since even after enabling all possible warnings and trying code changes to make everything nice according to them, nothing helps.

So if anybody has any more intel, this could be a nice lesson. :)

On a side note. On ARMv7 device (Odroid U3) running my fork I got "ZopfliVerifyLenDist: Assertion `data[pos - dist + i] == data[pos + i]' failed." once. Not sure if it has anything to do with GCC 4.8.x or it's just that some of switches in my fork don't play along with each other on certain data (--brotli --lazy while certain static block split points were passed with --cbs command).

JayXon commented 8 years ago

duplicate: https://github.com/google/zopfli/issues/20

MrKrzYch00 commented 8 years ago

Hmm, interesting. So it should have something to do with double precision... Well changing all floats and doubles to long doubles didn't help either, and GCC 4.8 does provide same results on both ARMv7 and Core i7 (linux/virtualbox), while GCC 5.3 works differently on Core i7 than previous gcc version (windows host).

Seems like the produced assembly just does something unexpected either in 4.8 or 5.3 version...

EDIT: After chaning almost all floats to size_t (dirty hack) I still get different results even with block split points. The only thing that I didn't change was ZopfliCalculateEntropy which as far as I see runs only through iterating a dynamic block (correct me if I'm wrong). So we may suspect that it's something different than double precission...

EDIT2: It's late here now so I will just share what I discovered so far. One of these functions seems to provide different data with GCC 4.8 (linux) and GCC 5.3 (mingw): GetDynamicLengths(lz77, lstart, lend, ll_lengths, d_lengths); result += CalculateTreeSize(ll_lengths, d_lengths); fprintf(stderr,"Debug: %f\n",result); /* Added for debugging */

Windows mingw gcc 5.3: Debug: 880.000000 Debug: 864.000000 Debug: 935.000000 Debug: 860.000000 Debug: 909.000000 Debug: 873.000000

Linux gcc 4.8: Debug: 880.000000 Debug: 862.000000 <- * Debug: 942.000000 Debug: 863.000000 Debug: 909.000000

EDIT 3: gdb debugging. Breakpoint set at return of CalculateTreeSize. The breakpoint was rached only once. (gdb) info local: GCC 5.3: ll_lengths = {6, 6, 7, 7, 8, 8, 9, 9, 7, 9, 9, 9, 8, 9, 9, 6, 7, 9, 10, 10, 8, 8, 10, 9, 9, 10, 9, 10, 9, 10, 10, 9, 8, 10, 11, 11, 6, 10, 0, 11, 8, 8, 10, 10, 10, 10, 9, 10, 9, 7, 10, 10, 10, 10, 10, 10, 9, 7, 10, 9, 9, 9, 9, 9, 8, 6, 9, 9, 7, 7, 9, 10, 5, 6, 9, 10, 6, 7, 11, 10, 9, 9, 10, 10, 9, 10 [repeats 11 times], 8, 10, 10, 10, 10, 10, 8, 11, 9, 10, 11, 13, 9, 10, 10, 9, 9, 11, 8, 8, 7, 8, 8, 9, 9, 10, 10, 11, 9, 10, 9, 10, 9, 8, 9, 6, 7, 7, 8, 9, 9, 5, 12, 6, 9, 6, 10, 10, 8, 11, 12, 12, 9, 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 13, 10, 9, 11, 11, 9, 10, 13, 11, 10, 9, 10, 9, 9, 8, 8, 9, 9, 9, 9, 9, 9, 7, 7, 8, 8, 8, 7, 8, 8, 9, 8, 10, 9, 10, 10, 10, 10, 8, 8, 8, 9, 9, 9, 10, 10, 8, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9, 11, 9, 10, 9, 11, 6, 6, 10, 7, 9, 9, 10, 10, 8, 8, 10, 10, 8, 11, 9, 9, 8, 8, 9, 9, 9, 9, 9, 7, 13, 4, 4, 5, 6, 6, 7, 7, 8, 7, 8, 8, 9, 9, 9, 13 [repeats 13 times], 14, 14, 0, 0} d_lengths = {7, 9, 0, 9, 6, 6, 6, 5, 5, 5, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 5, 5, 5, 6, 6, 8, 0, 0, 0, 0} result = 880

GCC 4.8: ll_lengths = {6, 6, 7, 7, 8, 8, 9, 9, 7, 9, 9, 9, 8, 9, 9, 6, 7, 9, 10, 10, 8, 8, 10, 9, 9, 10, 9, 10, 9, 10, 10, 9, 8, 10, 11, 11, 6, 10, 0, 11, 8, 8, 10, 10, 10, 10, 9, 10, 9, 7, 10, 10, 10, 10, 10, 10, 9, 7, 10, 9, 9, 9, 9, 9, 8, 6, 9, 9, 7, 7, 9, 10, 5, 6, 9, 10, 6, 7, 11, 10, 9, 9, 10, 10, 9, 10 [repeats 11 times], 8, 10, 10, 10, 10, 10, 8, 11, 9, 10, 11, 14, 9, 10, 10, 9, 9, 11, 8, 8, 7, 8, 8, 9, 9, 10, 10, 11, 9, 10, 9, 10, 9, 8, 9, 6, 7, 7, 8, 9, 9, 5, 12, 6, 9, 6, 10, 10, 8, 11, 12, 12, 9, 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 14, 10, 9, 11, 11, 9, 10, 13, 11, 10, 9, 10, 9, 9, 8, 8, 9, 9, 9, 9, 9, 9, 7, 7, 8, 8, 8, 7, 8, 8, 9, 8, 10, 9, 10, 10, 10, 10, 8, 8, 8, 9, 9, 9, 10, 10, 8, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9, 11, 9, 10, 9, 11, 6, 6, 10, 7, 9, 9, 10, 10, 8, 8, 10, 10, 8, 11, 9, 9, 8, 8, 9, 9, 9, 9, 9, 7, 13, 4, 4, 5, 6, 6, 7, 7, 8, 7, 8, 8, 9, 9, 9, 13 [repeats 15 times], 0, 0} d_lengths = {7, 9, 0, 9, 6, 6, 6, 5, 5, 5, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4, 4, 5, 5, 5, 6, 6, 8, 0, 0, 0, 0} result = 880

Notice the differences near the end of ll_lengths... GCC 5.3 repeats 13, 13 times then two 14's follow, GCC 4.8 on the other hand repeats 13, 15 times. This if very interesting why ll_lengths has a bit different results on exactly the same file when compiled with different GCC versions.

EDIT 5: And I found the exact function which is being compiled differently on different comilers:

GCC: 5.3

Breakpoint 4, ZopfliLengthLimitedCodeLengths (frequencies=0xbeffe7e0, n=288, maxbits=15, bitlengths=0xbeffed20) at src/zopfli/katajainen.c:191 191 Node* leaves = (Node_)malloc(n * sizeof(_leaves)); (gdb) print (unsigned [288]) *bitlengths $6 = {8 [repeats 144 times], 9 [repeats 112 times], 7 [repeats 24 times], 8, 8, 8, 8, 8, 8, 8, 8} Continuing.

Breakpoint 3, ZopfliLengthLimitedCodeLengths (frequencies=0xbeffe7e0, n=288, maxbits=15, bitlengths=0xbeffed20) at src/zopfli/katajainen.c:246 246 free(lists); (gdb) print (unsigned [288]) *bitlengths $7 = {6, 6, 7, 7, 8, 8, 9, 9, 7, 9, 9, 9, 8, 9, 9, 6, 7, 9, 10, 10, 8, 8, 10, 9, 9, 10, 9, 10, 9, 10, 10, 9, 8, 10, 11, 11, 6, 10, 0, 11, 8, 8, 10, 10, 10, 10, 9, 10, 9, 7, 10, 10, 10, 10, 10, 10, 9, 7, 10, 9, 9, 9, 9, 9, 8, 6, 9, 9, 7, 7, 9, 10, 5, 6, 9, 10, 6, 7, 11, 10, 9, 9, 10, 10, 9, 10 [repeats 11 times], 8, 10, 10, 10, 10, 10, 8, 11, 9, 10, 11, 14, 9, 10, 10, 9, 9, 11, 8, 8, 7, 8, 8, 9, 9, 10, 10, 11, 9, 10, 9, 10, 9, 8, 9, 6, 7, 7, 8, 9, 9, 5, 12, 6, 9, 6, 10, 10, 8, 11, 12, 12, 9, 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 14, 10, 9, 11, 11, 9, 10, 13, 11, 10, 9, 10, 9, 9, 8, 8, 9, 9, 9, 9, 9, 9, 7, 7, 8, 8, 8, 7, 8, 8, 9, 8, 10, 9, 10, 10, 10, 10, 8, 8, 8, 9, 9, 9, 10, 10, 8, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9, 11, 9, 10, 9, 11, 6, 6, 10, 7, 9, 9, 10, 10, 8, 8, 10, 10, 8, 11, 9, 9, 8, 8, 9, 9, 9, 9, 9, 7, 13, 4, 4, 5, 6, 6, 7, 7, 8, 7, 8, 8, 9, 9, 9, 13 [repeats 15 times], 0, 0} (gdb)

GCC 4.8:

Breakpoint 4, ZopfliLengthLimitedCodeLengths (frequencies=0x28ef70, n=288, maxbits=15, bitlengths=0x28f4bc) at src/zopfli/katajainen.c:191 191 Node* leaves = (Node_)malloc(n * sizeof(_leaves)); (gdb) print (unsigned [288]) *bitlengths $4 = {8 [repeats 144 times], 9 [repeats 112 times], 7 [repeats 24 times], 8, 8, 8, 8, 8, 8, 8, 8} Continuing.

Breakpoint 3, ZopfliLengthLimitedCodeLengths (frequencies=0x28ef70, n=288, maxbits=15, bitlengths=0x28f4bc) at src/zopfli/katajainen.c:246 246 free(lists); (gdb) print (unsigned [288]) *bitlengths $5 = {6, 6, 7, 7, 8, 8, 9, 9, 7, 9, 9, 9, 8, 9, 9, 6, 7, 9, 10, 10, 8, 8, 10, 9, 9, 10, 9, 10, 9, 10, 10, 9, 8, 10, 11, 11, 6, 10, 0, 11, 8, 8, 10, 10, 10, 10, 9, 10, 9, 7, 10, 10, 10, 10, 10, 10, 9, 7, 10, 9, 9, 9, 9, 9, 8, 6, 9, 9, 7, 7, 9, 10, 5, 6, 9, 10, 6, 7, 11, 10, 9, 9, 10, 10, 9, 10 [repeats 11 times], 8, 10, 10, 10, 10, 10, 8, 11, 9, 10, 11, 13, 9, 10, 10, 9, 9, 11, 8, 8, 7, 8, 8, 9, 9, 10, 10, 11, 9, 10, 9, 10, 9, 8, 9, 6, 7, 7, 8, 9, 9, 5, 12, 6, 9, 6, 10, 10, 8, 11, 12, 12, 9, 9, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 13, 10, 9, 11, 11, 9, 10, 13, 11, 10, 9, 10, 9, 9, 8, 8, 9, 9, 9, 9, 9, 9, 7, 7, 8, 8, 8, 7, 8, 8, 9, 8, 10, 9, 10, 10, 10, 10, 8, 8, 8, 9, 9, 9, 10, 10, 8, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9, 11, 9, 10, 9, 11, 6, 6, 10, 7, 9, 9, 10, 10, 8, 8, 10, 10, 8, 11, 9, 9, 8, 8, 9, 9, 9, 9, 9, 7, 13, 4, 4, 5, 6, 6, 7, 7, 8, 7, 8, 8, 9, 9, 9, 13 [repeats 13 times], 14, 14, 0, 0} (gdb)

Frequencies are the same for in both GCC: (gdb) print (size_t [288]) *frequencies $8 = {97, 95, 52, 43, 28, 32, 17, 17, 36, 16, 11, 12, 21, 17, 13, 135, 36, 10, 6, 8, 20, 24, 8, 11, 13, 6, 10, 5, 13, 6, 5, 12, 33, 7, 3, 3, 75, 8, 0, 4, 21, 23, 6, 6, 6, 6, 12, 8, 17, 50, 6, 6, 6, 6, 6, 6, 16, 44, 5, 12, 10, 10, 10, 10, 29, 72, 17, 12, 55, 47, 14, 8, 182, 80, 13, 7, 81, 37, 3, 7, 15, 9, 6, 8, 14, 7 [repeats 11 times], 21, 8, 8, 8, 8, 8, 22, 3, 9, 5, 4, 1, 17, 7, 6, 13, 13, 3, 22, 19, 58, 34, 22, 16, 15, 5, 5, 4, 15, 5, 13, 6, 13, 18, 13, 75, 55, 51, 18, 14, 14, 155, 2, 86, 9, 76, 6, 7, 30, 4, 2, 2, 12, 9, 5, 5, 5, 5, 5, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 5, 9, 4, 4, 14, 5, 1, 3, 5, 9, 7, 17, 9, 31, 22, 10, 9, 10, 10, 10, 10, 47, 43, 22, 25, 30, 51, 25, 25, 14, 20, 6, 12, 8, 8, 8, 8, 28, 23, 31, 16, 14, 14, 8, 7, 28, 15, 13, 9, 9, 9, 9, 9, 19, 17, 11, 4, 11, 8, 13, 4, 90, 78, 7, 40, 9, 15, 8, 7, 29, 20, 6, 5, 22, 3, 17, 14, 29, 32, 12, 13, 9, 16, 14, 65, 1, 431, 405, 225, 132, 80, 44, 37, 29, 53, 29, 18, 9, 13, 11, 1 [repeats 15 times], 0, 0}

In conclusion procedure ZopfliLengthLimitedCodeLengths (or any of its sub-procedures) fails at being fully portable between compilers.

MrKrzYch00 commented 8 years ago

Accordin to all my EDITs in previous comment, this bug is unrelated to #20, since the bug exist in ZopfliLengthLimitedCodeLengths or in any of its sub-procedures and neither of those do floating point operations. I will try and see if I can pin-point exactly what happens there, using gdb.

EDIT: Linux gcc 4.8 (sorts elements by ordering same value elements from the beginning of stuct): Breakpoint 1, ZopfliLengthLimitedCodeLengths (frequencies=0xbeffe7e0, n=288, maxbits=15, bitlengths=0xbeffed20) at src/zopfli/katajainen.c:223 223 qsort(leaves, numsymbols, sizeof(Node), LeafComparator); (gdb) print leaves[0] $893 = {weight = 97, count = 0, inuse = 197416, tail = 0x30328} (gdb) print leaves[1] $894 = {weight = 95, count = 1, inuse = 0, tail = 0x0} (gdb) c Continuing.

Breakpoint 2, ZopfliLengthLimitedCodeLengths (frequencies=0xbeffe7e0, n=288, maxbits=15, bitlengths=0xbeffed20) at src/zopfli/katajainen.c:226 226 pool.size = 2 * maxbits * (maxbits + 1); (gdb) print leaves[0] $895 = {weight = 1, count = 107, inuse = 2090957984, tail = 0x7ca37ca2} (gdb) print leaves[1] $896 = {weight = 1, count = 170, inuse = 2123988632, tail = 0x7e9b7e9a}

Windows mingw64 gcc 5.3 (sorts elements by ordering same value elements from the end of stuct): Breakpoint 1, ZopfliLengthLimitedCodeLengths (frequencies=0x28ef70, n=288, maxbits=15, bitlengths=0x28f4bc) at src/zopfli/katajainen.c:223 223 qsort(leaves, numsymbols, sizeof(Node), LeafComparator); (gdb) print leaves[0] $875 = {weight = 97, count = 0, inuse = 13 '\r', tail = 0xbaadf00d} (gdb) print leaves[1] $876 = {weight = 95, count = 1, inuse = 13 '\r', tail = 0xbaadf00d} (gdb) c Continuing.

Breakpoint 2, ZopfliLengthLimitedCodeLengths (frequencies=0x28ef70, n=288, maxbits=15, bitlengths=0x28f4bc) at src/zopfli/katajainen.c:226 226 pool.size = 2 * maxbits * (maxbits + 1); (gdb) print leaves[0] $877 = {weight = 1, count = 285, inuse = 13 '\r', tail = 0xbaadf00d} (gdb) print leaves[1] $878 = {weight = 1, count = 284, inuse = 13 '\r', tail = 0xbaadf00d}

For some reason sorting of other struct elements differs between compilators when qsort is used.

MrKrzYch00 commented 8 years ago

This makes zopfli use ascending sorting order when weights are equal instead of reverse which is occuring by default on GCC 5.3 and fixes different results issue. Modify if counts should be reverse ordered.

(check pull #96 for code)

jibsen commented 8 years ago

Nice detective work there, @MrKrzYch00.

MrKrzYch00 commented 8 years ago

Thanks, took a while to figure it out... it's also good to learn that one should really pay attention when relying on qsort. If it differs from compiler to compiler with it's default behavior then it's unstable sorter leading in turn to undefined behavior in rest of the code. :)

Said so I will debug it further with my fork (reverse ordering of counts) to see if I get smaller GZIP files (and it they are valid).

jibsen commented 8 years ago

The standard does not require qsort to be stable (C11 7.22.5.2p4).

MrKrzYch00 commented 8 years ago

However, some sort of a warning during compile time could be issued when comparing bigger structures/arrays and the checker is considered too simple, to safe time on debugging later. :)

MrKrzYch00 commented 8 years ago

Fixed with: https://github.com/google/zopfli/commit/37f6da6ec1f8353a438b5543709207d54476630f