trietptm / lz4

Automatically exported from code.google.com/p/lz4
0 stars 0 forks source link

Invalid use of restrict. #34

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Functions LZ4_uncompress() and LZ4_uncompress_unknownOutputSize() use restrict 
keyword with local variables op and ref:

const BYTE* restrict ref;

BYTE* restrict op = (BYTE*) dest;

As far as I understand how 'restrict' works, this seems like an invalid use 
that can fool the compiler. 

In fact, when using icc version 11.0 (with plenty of optimizations on), this 
results in wrong decompression output (fixed by getting rid of restrict on both 
variables).

Original issue reported on code.google.com by ala.lusz...@gmail.com on 24 Sep 2012 at 2:20

GoogleCodeExporter commented 9 years ago
Good point. Thanks for pointing that out.

I incorrectly understood "restrict" keyword as being a declaration of intention 
regarding "write operation" within a defined memory zone.
In this case, 'op' is the only pointer authorized to write into output buffer, 
so it would have been right.

But reading again the definition, i find that the declaration of intention is 
much broader, and applies to any access to the memory zone. In this case, it's 
wrong since 'ref' is also authorized to "read" the output buffer.

It's unfortunate i can not reproduce the issue in order to witness and study it.

Original comment by yann.col...@gmail.com on 24 Sep 2012 at 4:20

GoogleCodeExporter commented 9 years ago
I'll look into it.

Original comment by yann.col...@gmail.com on 24 Sep 2012 at 4:20

GoogleCodeExporter commented 9 years ago
I use LZ4 embedded and slightly modified, so I cannot give you instructions for 
reproducing this right away. But I'll try to find a case for you.

Anyway, I didn't checked the assembly, but with both op and ref marked as 
restrict this:
*op++ = *ref++;
*op++ = *ref++;
*op++ = *ref++;
*op++ = *ref++;
is probably optimized to something like:
*(int*)op = *(int*)ref;
op += 4;
ref += 4;

Original comment by ala.lusz...@gmail.com on 24 Sep 2012 at 5:00

GoogleCodeExporter commented 9 years ago
I managed to get a reproducible case. However, note that there's an #ifdef 
which incorrectly identifies icc as not knowing the restrict keyword, which 
overshadows the issue.

[ala@intruder lz4]$ cat /proc/cpuinfo  | grep "model name" | head -n 1
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
[ala@intruder lz4]$ icc --version
icc (ICC) 11.0 20081105
Copyright (C) 1985-2008 Intel Corporation.  All rights reserved.
[ala@intruder lz4]$ svn diff
Index: Makefile
===================================================================
--- Makefile    (revision 68)
+++ Makefile    (working copy)
@@ -11,10 +11,10 @@
 all: lz4demo64 lz4demo32 

 lz4demo64: lz4.c lz4.h lz4hc.c lz4hc.h bench.c lz4demo.c
-   gcc      -O3 -I. -std=c99 -Wall -W -Wundef -Wno-implicit-function-declaration 
lz4hc.c lz4.c bench.c lz4demo.c -o $(OUTPUT64)
+   icc -O3 -restrict lz4hc.c lz4.c bench.c lz4demo.c -o $(OUTPUT64)

 lz4demo32: lz4.c lz4.h lz4hc.c lz4hc.h bench.c lz4demo.c
-   gcc -m32 -Os -march=native -I. -std=c99 -Wall -W -Wundef 
-Wno-implicit-function-declaration lz4hc.c lz4.c bench.c lz4demo.c -o 
$(OUTPUT32)
+   icc -O3 -restrict lz4hc.c lz4.c bench.c lz4demo.c -o $(OUTPUT32)

 clean:
    rm -f core *.o $(OUTPUT32) $(OUTPUT64)
Index: lz4.c
===================================================================
--- lz4.c   (revision 68)
+++ lz4.c   (working copy)
@@ -98,12 +98,6 @@
 //**************************************
 // Compiler Options
 //**************************************
-#if __STDC_VERSION__ >= 199901L // C99
-/* "restrict" is a known keyword */
-#else
-#  define restrict // Disable restrict
-#endif
-
 #define GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)

 #ifdef _MSC_VER  // Visual Studio
[ala@intruder lz4]$ cat a
aAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaA
aAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaA
aAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaA
[ala@intruder lz4]$ ./lz4demo64 -c0 a a-compressed
*** Compression CLI using LZ4 algorithm , by Yann Collet (Sep 25 2012) ***
Compressed 203 bytes into 20 bytes ==> 9.85%
Done in 0.00 s ==> inf MB/s
[ala@intruder lz4]$ ./lz4demo64 -d a-compressed a-uncompressed
*** Compression CLI using LZ4 algorithm , by Yann Collet (Sep 25 2012) ***
Successfully decoded 203 bytes 
Done in 0.00 s ==> inf MB/s
[ala@intruder lz4]$ cat a-uncompressed 
aAaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAa
aAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaA

Original comment by ala.lusz...@gmail.com on 25 Sep 2012 at 9:02

GoogleCodeExporter commented 9 years ago
Hi

Many thanks for detailed instructions on how to re-create the issue.
I currently do not have icc available around, and therefore cannot use your 
information yet. That's probably something i will have to improve in the future.

Nonethless, I've been investigating this long overdue issue lately.
I've followed your advise of removing the "restrict" keyword.

Initial benchmarks show that performance difference is very small, for Intel 
CPU.
The resulting source file is in attached file (release candidate 80)

Regards

Original comment by yann.col...@gmail.com on 24 Oct 2012 at 3:52

Attachments:

GoogleCodeExporter commented 9 years ago
Corrected into r80.

Original comment by yann.col...@gmail.com on 26 Oct 2012 at 12:37