pfalcon / uzlib

Radically unbloated DEFLATE/zlib/gzip compression/decompression library. Can decompress any gzip/zlib data, and offers simplified compressor which produces gzip-compatible output, while requiring much less resources (and providing less compression ratio of course).
Other
303 stars 82 forks source link

uzlib: support streaming #2

Closed prusnak closed 6 years ago

prusnak commented 8 years ago

(moved from https://github.com/micropython/micropython/issues/1910)

It would be nice if uzlib supported streamed decompression. I was toying with the idea of adding the following callback to the TINF_DATA structure:

void (*destWrite)(unsigned char byte);

or

void (*destWrite)(unsigned char *data, unsigned int datalen);

And tinf_uncompress functions would use this instead of other d->destFoo members while uncompressing. This would need an introduction of look-back sliding window (of what size? 32k seems to be the default for wbits=15, but we can limit acceptable compression to wbits=8 resulting in 256 byte window for decompression, which is fine by me). Or we can provide wbits size as a compile-time #define macro, or even as a decompress function parameter when dynamic allocation of memory is permitted.

Tell me, if you want me to pursue this idea and what approach do you like the most.

pfalcon commented 8 years ago

So, as I mentioned, this will be required to implement upip for esp8266, and I would eventually work on this myself. But this is challenging task, real computer science (dealing with algorithms and stuff), so if you're challenged by it, and have enough time/interest, you're welcome to give it your try.

Few points I may mention:

  1. Yes, as you mention, for stream decompression, a look-back buffer will be required, and for zlib compression, it can be up to 32K, and that size should be supported. Of course, if it's known that lesser wbits was used for compression, smaller buffer should be allocated (e.g., esp8266 won't be able to support 32K window almost certainly). This buffer apparently needs to be organized as a circular buffer.
  2. Non-streaming decompression should still be supported (doesn't require separate look-back buffer and thus more memory-efficient for small files).
  3. Besides buffering output stream, you will almost certainly need to buffer input stream - the current algo again assumes that entire stream in the memory. So, you may need to figure out what's the minimum input buffer required (to not patch current algo too much). To that course, I already did some consideration/asking which you may find useful: http://stackoverflow.com/questions/25431160/what-is-the-maximum-size-of-encoded-dynamic-huffman-tree-as-used-by-deflate-zli
prusnak commented 8 years ago

I realized that my use-case is a little bit odd: my input is held completely in the memory, what I need to stream is just the output. As this is in contrast with what I assume uzlib aims to do (either have both input and output in memory or stream both input and output), I think I'll just fork and adapt the code to my use. Thank for your explanation!

pfalcon commented 8 years ago

That's indeed a bit odd. With both compressed/decompressed areas in memory, only small amounts of data can be processed. With full streaming implementation, unlimited amounts of data can be processed. With "half-streaming" implementation, you will be able to process just slightly more data than with in-memory implementation, that's all.

prusnak commented 8 years ago

Maybe it starts making sense when I reveal that my usecase is decompressing compressed image assets (stored on flash memory) directly into TFT display driver (which has its own framebuffer with only write(uint8_t byte) interface). I updated uzlib code with circular buffer of fixed size of 1024 (I am compressing the assets with WBITS=10) and it works very well. But I guess, this is getting off-topic.

pfalcon commented 6 years ago

Streaming decompression was implemented quite some time ago.