Refactor to support streaming

GoogleCodeExporter commented 8 years ago

LZ4 library should be refactored to support streaming compression and 
decompression similar to how zlib works.

There would be an lz4Init(), lz4(), and lz4End() similar to how the inflate 
functions work in zlib. You would pass around a pointer to the stream for each 
function. Call to lz4() would save the state of the compression/decompression 
machine for subsequent calls to lz4(). Here is zlib's stream structure.

typedef struct z_stream_s {
    z_const Bytef *next_in;     /* next input byte */
    uInt     avail_in;  /* number of bytes available at next_in */
    uLong    total_in;  /* total number of input bytes read so far */

    Bytef    *next_out; /* next output byte should be put there */
    uInt     avail_out; /* remaining free space at next_out */
    uLong    total_out; /* total number of bytes output so far */

    z_const char *msg;  /* last error message, NULL if no error */
    struct internal_state FAR *state; /* not visible by applications */

    alloc_func zalloc;  /* used to allocate the internal state */
    free_func  zfree;   /* used to free the internal state */
    voidpf     opaque;  /* private data object passed to zalloc and zfree */

    int     data_type;  /* best guess about the data type: binary or text */
    uLong   adler;      /* adler32 value of the uncompressed data */
    uLong   reserved;   /* reserved for future use */
} z_stream;

Original issue reported on code.google.com by nathan.m...@gmail.com on 4 Nov 2012 at 9:40

GoogleCodeExporter commented 8 years ago

BZIP2 library also works similarly with a bz_stream structure and similar 
function prototypes.

Original comment by nathan.m...@gmail.com on 4 Nov 2012 at 9:50

GoogleCodeExporter commented 8 years ago

LZ4 format is unfortunately incompatible with some corner-cases streaming 
scenarios.
It would be necessary to first modify it, not a big task, but with some 
potential risks regarding compatibility with existing user base.

I keep your suggestion in mind, since well, it's not the first time this is 
required, and if there is enough pression for it, i might as well proceed with 
the changes.

Original comment by yann.col...@gmail.com on 4 Nov 2012 at 11:54

GoogleCodeExporter commented 8 years ago

Not a defect. Enhancement request.

Original comment by yann.col...@gmail.com on 4 Nov 2012 at 11:55

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 8 years ago

Note that we have written a small encapsulating library wrapping lz4/fastlz (or 
any similar block compression library) with a zlib-like API (and a pluggable 
compatibility header fastlzlib-zlib.h)
 https://github.com/exalead/fastlzlib

Feel free to use it and report issues!

Original comment by xroche on 4 Jan 2013 at 3:04

GoogleCodeExporter commented 8 years ago

Thanks Xavier.
It's an excellent reference.

Original comment by yann.col...@gmail.com on 4 Jan 2013 at 3:38

GoogleCodeExporter commented 8 years ago

streaming support might be nice for, for instance, web apps, etc...I suppose :)

Original comment by rogerpack2005 on 14 Aug 2013 at 7:31

GoogleCodeExporter commented 8 years ago

I know. This is a work in progress ;)
stay tune for updates...

Original comment by yann.col...@gmail.com on 14 Aug 2013 at 7:32

GoogleCodeExporter commented 8 years ago

Any progress or code we can poke at yet? Thanks for a great library.

Original comment by fullung@gmail.com on 30 Dec 2013 at 5:22

GoogleCodeExporter commented 8 years ago

Some core functions are now present into lz4.h, but they require to manually 
care about buffer management and layout.

An interface with abstraction like zlib still has to be completed.
It's in my todo list.

Original comment by yann.col...@gmail.com on 30 Dec 2013 at 5:25

GoogleCodeExporter commented 8 years ago

Streaming support would be great

Original comment by doppelba...@gmail.com on 24 Jan 2014 at 8:54

GoogleCodeExporter commented 8 years ago

Sure.

I'll probably scale down my ambitions in order to get "something out".

My current attempt tries to take into consideration too many corner cases, and 
it proves to complex for a first release.

Original comment by yann.col...@gmail.com on 25 Jan 2014 at 2:52

GoogleCodeExporter commented 8 years ago

Original comment by yann.col...@gmail.com on 22 Apr 2014 at 11:08

Added labels: Priority-High
Removed labels: Priority-minor

GoogleCodeExporter commented 8 years ago

Original comment by yann.col...@gmail.com on 20 May 2014 at 9:22

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Some progresses on this front.

http://fastcompression.blogspot.fr/2014/05/streaming-api-for-lz4.html

Comments & Questions welcomed.

Original comment by yann.col...@gmail.com on 20 May 2014 at 9:23

GoogleCodeExporter commented 8 years ago

Would it be possible to de-compress a stream?

Original comment by doppelba...@gmail.com on 20 May 2014 at 9:43

GoogleCodeExporter commented 8 years ago

Of course :
LZ4_decompress_safe_usingDict()

Original comment by yann.col...@gmail.com on 20 May 2014 at 9:46

GoogleCodeExporter commented 8 years ago

Wouldn't it be possible to keep the "old" compression streaming API and add a 
decompression API like:

void* LZ4_decompress_init (...);
int LZ4_decompress_continue (...);
int LZ4_decompress_free (...);

Original comment by doppelba...@gmail.com on 31 May 2014 at 10:34

GoogleCodeExporter commented 8 years ago

Yes, it could be.

Original comment by yann.col...@gmail.com on 31 May 2014 at 10:07

GoogleCodeExporter commented 8 years ago

@doppelbauer :

Just to understand,
if I do understand your request properly,
it seems the main function you ask for : int LZ4_decompress_continue (...);
already exists, but is called : int LZ4_decompress_safe_withPrefix64k (...);

There is no "init" nor "free" associated, because there is no need for them.

Is it a problem with the function naming ?

Original comment by yann.col...@gmail.com on 1 Jun 2014 at 10:54

GoogleCodeExporter commented 8 years ago

@doppelbauer :
Maybe I misunderstood.
LZ4_decompress_safe_withPrefix64k() works without any need for a tracking 
structure, but implies that previously decoded data must stand *just before* 
the memory buffer where new data will be decoded.

Did you meant that LZ4_decompress_safe_continue() should have the ability to 
decompress new blocks without this "positioning" condition, i.e. with 
previously decoded memory block anywhere into memory, and still the ability to 
use it to decompress next block ?
Then it would become the equivalent of LZ4_decompress_safe_usingDict(), but 
without the need to explicitly tell where the previous data block is, it would 
be automatically determined by the tracking structure.

Original comment by yann.col...@gmail.com on 9 Jun 2014 at 2:24

GoogleCodeExporter commented 8 years ago

Hi Yann,

Thanks a lot for your answer.

Maybe I didn't understand the API.
It would be great to de-compress a LZ4 stream packet by packet.

I have attached a test-case "streaming-test.c". It compressed random-data and 
tries to uncompress in 64k chunks:
gcc -o test streaming-test.c lz4.c && ./test

Thanks a lot!
Markus

Original comment by doppelba...@gmail.com on 10 Jun 2014 at 9:20

Attachments:

streaming-test.c

GoogleCodeExporter commented 8 years ago

Hi doppelbauer

I've looked at your example.
There is a small flaw that I'll try to explain.

If you want to decompress 64K chunks, you have to compress 64K chunks.
LZ4_decompress_xxx() functions can only decompress full chunks.
(except LZ4_decompress_safe_partial, but that's a very specific exception, and 
doesn't match your use case anyway).

In order to compress 64K chunks, you can either :
- compress them independently, one by one, using LZ4_compress()
- compress them in "chain mode", meaning successive blocks will reference 
previous ones, boosting compression ratio (but also require to decompress them 
in sequence) : this is where you use LZ4_compress_continue()

Once one of above conditions is met, you can decompress 64KB chunks, using 
LZ4_decompress_safe_withPrefix64k(), as you did in your example.

If the naming is confusing, I could also propose an identical function named 
LZ4_decompress_continue(), just for the sake of clarity.

Regards

Original comment by yann.col...@gmail.com on 11 Jun 2014 at 10:15

GoogleCodeExporter commented 8 years ago

Streaming branch has been merged into "dev" branch.

https://github.com/Cyan4973/lz4/tree/dev

Getting closer to a release

Original comment by yann.col...@gmail.com on 11 Jun 2014 at 9:10

GoogleCodeExporter commented 8 years ago

Hi Yann,

I'm looking at LZ4 HC header in the 'dev' tree, and it still states  'Note : 
these streaming functions still follows the older model'.   Should these 
functions still be used for new projects?  Or is there some newer/better API. 
I'd really appreciate a pointer here. A single function name would be enough.   
But if there's some sample code, that'd be even better ;)

Thanks!

Original comment by dchichkov@gmail.com on 13 Jun 2014 at 1:01

GoogleCodeExporter commented 8 years ago

Hi Dmitry

Realistically, the current streaming API of LZ4 HC will remain "as is" for a 
few weeks. It requires time and caution to adapt, while I'll have to move on 
and spend some time on a long-overdue request for xxHash.
Therefore, I'll soon update LZ4, with the new streaming interface *for the Fast 
variant only*.

Should you need to start a development using LZ4 HC streaming interface, I 
recommend to use the currently available interface.

Hopefully, it shouldn't be much of a problem.
The benefit of the new interface is that it will be more flexible. But if you 
can work your problem with the current interface limitation, you'll have no 
issue to adapt it for the new interface when it will be available.

Moreover, I intend to continue supporting the "current" streaming interface for 
quite some time, even after publication of the new one, by putting the relevant 
functions into "obsolete" category. Since they will stay there for some time, 
it will give users time to adapt.

Regards

Original comment by yann.col...@gmail.com on 14 Jun 2014 at 10:12

GoogleCodeExporter commented 8 years ago

Hi Yann,

Is there a chance to decompress a stream in chunks, which was created 
via "LZ4_compress()"?

Thanks a lot!
Markus

Original comment by doppelba...@gmail.com on 14 Jun 2014 at 10:46

GoogleCodeExporter commented 8 years ago

@doppelbauer

Basically, no.
If you compress a chunk as a single block, using LZ4_compress(), you have to 
decompress it as a single block too.

The only exception is LZ4_decompress_safe_partial(), which can decompress a 
part of the block, from the beginning of the block, to targetOutputSize.
But if what you need is a chunk at the end of the block, you will have to 
decompress the entire block too.

It's still unclear to me if your problem is related to memory management (you 
don't want to decompress the entire block, because there is not enough memory 
for it), or to random access into the block (you just want to decompress a 
small part of the compressed block because that's all you need).

Both problems are vastly different and require completely different solutions.

Original comment by yann.col...@gmail.com on 14 Jun 2014 at 10:52

GoogleCodeExporter commented 8 years ago

Database-Server sends compressed records to a client
Currently I use "LZ4_compress()" and "LZ4_decompress_safe()" - but the client 
has to know the size of the decompressed value (and starts decompressing after 
receiving the last byte).
So my idea was, that it would be great to decompress the stream in chunks.

Thanks a lot!
Markus

Original comment by doppelba...@gmail.com on 14 Jun 2014 at 10:59

GoogleCodeExporter commented 8 years ago

Then you may benefit from the new streaming API.
It basically depends on the size of your records.

If they are a few KB each, then there is probably no better alternative, you're 
already doing the right thing.

If they are a few MB each, then it's time to consider cutting the record into 
smaller blocks.
Use LZ4_compress_continue() on the small blocks instead of LZ4_compress().
This way, compression ratio will remain roughly equivalent (instead of 
dramatically falling down as the block size gets smaller).

Then, you'll be able to decompress the small blocks one by one, using 
LZ4_decompress_safe_withPrefix64k() or LZ4_decompress_safe_usingDict().

Original comment by yann.col...@gmail.com on 14 Jun 2014 at 11:04

GoogleCodeExporter commented 8 years ago

A new streaming API is proposed within r118.
It is only proposed for the Fast compression variant and decompression though.
I keep the issue opened, it will be closed when the new streaming API will also 
be proposed within the High Compression variant LZ4HC.

Original comment by yann.col...@gmail.com on 26 Jun 2014 at 9:51

GoogleCodeExporter commented 8 years ago

With the release of latest LZ4 HC streaming API, I guess we can now close this 
long-standing enhancement request.

Original comment by yann.col...@gmail.com on 8 Nov 2014 at 8:44

GoogleCodeExporter commented 8 years ago

Supported into r124

Original comment by yann.col...@gmail.com on 8 Nov 2014 at 8:45

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Thanks for doing this.

Original comment by rogerdpa...@gmail.com on 16 Dec 2014 at 6:19

xiaoyin0208 / lz4

Refactor to support streaming #42