Closed GoogleCodeExporter closed 8 years ago
Okay,
it doesn't look trivial,
but I'll look into it.
I hope that clang on Linux can generate the same kind of warning (and for the
same reasons) as clang on OS-X, since I don't own an Apple machine to find
out...
Original comment by yann.col...@gmail.com
on 18 Sep 2014 at 3:55
I think the LZ4 library is great. The decoding performance is awesome! I
used to go with zlib for the longest time, not really caring much about
compression. I was just thinking it is a deployment mechanism for the use
cases I have in mind.
My test data set is 10 MB of mostly PVRTC and ASTC compressed textures.
zlib compresses to 59% of the original size and the lz4hc does 61% which is
only 2% difference. The decoding speed, as you are well aware of, is on my
laptop (i7 macbook) 220 MB/s for zlib and 1480 MB/s for the lz4. Impressive.
This bandwidth by a single thread does satisfy my use-case completely: I
was thinking of splitting files into multiple chunks in the container
format, sure, can get close to theoretical memory bandwidth limit out of
this (wow, thank you again!) but since this is backed with ~500MB/s SSD, or
more typically, 100-140MB/s HDD the decoding bandwidth of single thread is
more than enough, with headroom, to double or triple the throughput (eg.
reading off slow media, then decoding will increase read-bandwidth by the
compression factor). This is completely new area for me and I am super glad
that I did take a look at your library!
Awesome job! :)
Original comment by rokkim...@gmail.com
on 19 Sep 2014 at 11:39
You're welcomed Rokkimake
Indeed, in the early days of LZ4, when it was not yet open sourced, th
single-thread decompression performance proved to be an issue for benchmarks.
Back then, the best neutral process known was to measure decompression timing
on a RAM drive. The problem is, the RAM drive was saturated with barely 1/2
thread. So long for showing-off my multi-threaded library...
Anyway, since then I understood that multi-threaded decompression was probably
not required for most I/O job, and only pure RAM-to-RAM applications may
benefit from this refinement. zRAM for example comes to mind.
But even for I/O, it's not all bad : all these unused CPU cycles can be spared
for other tasks completed in parallel, a use case which is a relatively common
for games and servers.
Regards
Original comment by yann.col...@gmail.com
on 20 Sep 2014 at 11:28
For your information :
I did attempted to compile lz4.c using clang 3.4 on Linux.
Unfortunately, it did not generated the mentioned warning.
So it's apparently difficult to reproduce on Linux.
The situation is a bit strange though :
LZ4_64BITS can only be evaluated at compile time since :
#define LZ4_64BITS (sizeof(void*)==8)
So the result is supposed to be constant (1 for 64 bits, 0 for 32 bits), and
the unused branch *should* be removed by dead code optimization.
By the way, your proposed workaround just default to byU32 since LZ4_64BITS is
indeed defined, so it's always true.
Therefore, for a simpler modification, I would suggest to directly hand select
the result, such as :
#define LZ4_64BITS 1
And if it still does not work, then modify directly the result within
LZ4_compress :
LZ4_64BITS ? byU32 : byPtr
becomes
byU32
Both byU32 and byPtr will work anyway.
It's just than byU32 is more efficient in 64-bits mode
and byPtr is more efficient in 32-bits mode.
Best Regards
Original comment by yann.col...@gmail.com
on 22 Sep 2014 at 5:58
For your information :
LZ4_64BITS detection macro is no longer used within latest version of LZ4,
accessible at :
https://github.com/Cyan4973/lz4/tree/dev
32/64 bits detection is now the job of a runtime function.
As a positive side effect, this should remove any remaining warning issue you
may experience with clang in relation with the old macro.
Regards
Original comment by yann.col...@gmail.com
on 3 Dec 2014 at 10:24
Updated into r125
Original comment by yann.col...@gmail.com
on 13 Dec 2014 at 1:05
Original issue reported on code.google.com by
rokkim...@gmail.com
on 18 Sep 2014 at 3:27