Decompression of a stream in output chunks without dictionary

maximevince commented 3 years ago

From the README, I gather:

Handling of output (decompressed) stream:

In-memory decompression, where output stream fully resides in memory.

Streaming decompression, which allows to process arbitrary-sized streams (longer than available memory), but requires in-memory buffer for Deflate dictionary window.

Application specifies number of output bytes it wants to decompress, which can be as high as UINT_MAX to decompress everything into memory at once, or as low as 1 to decompress byte by byte, or any other value to decompress a chunk of that size.

Option 1 is not feasible on the target MCU (not enough RAM)
Option 2 works on a test application, but is not feasible on the target MCU, if the dictionary must be 32kB (not enough RAM)

That leaves me with: Option 3: What I am trying to do is to decompress 128 bytes (or any other value) of bytes at a time (the 'chunk' size) from a stream. (Streaming the input in 128b chunks works fine!)

This 128 byte output works, once.

After that, if I re-init d.dest_start = d.dest = uz_data_dest;, this will cause an error -3 when decompressing the next series of bytes.

I found out that this is because of the LZSS offset (lzOff) looking for previous occurrences of the same data, relative to the current position of the output buffer (hence d->dest[0] = d->dest[d->lzOff]). But since the output buffer has been re-initialized, these relative offsets are not valid anymore. So that sounds logical.

I gather that's why in this case, a dict is needed. Sure enough, I added a 32kB rambuffer to be used as dict, and it worked. However, on the target embedded system, I don't have 32kB of RAM...

So the question is:

Can decompressing in chunks (writing to the same RAM location for every chunk) be done without dictionary?
If not, how big must the dictionary really be? Can a size smaller than 32kB be used and what are the conditions?

With some explanation about the exact usage of these case, I would be glad to provide some example code to be included in the repo, if that would help as documentation for the project!

Thanks for the project! :+1:

maximevince commented 3 years ago

As an update: I found a way to decompress only per chunk of X bytes to RAM, then write that to flash memory (my use-case is a compressed firmware image for OTA firmware upgrades), then reset the RAM output buffer (d->dest), and then continue decompressing, without the need for a 32K dictionary RAM buffer.

By detecting whenever the lzOff is pointer outside (i.e. before the start) of the d->dest_start buffer, I can still access that data in the 'external` buffer.

A first proof-of-concept is implemented here: https://github.com/maximevince/uzlib/commit/c528e63204b6035b77853b06b39389e0f85b80d3

After testing this, I realized there are many use-cases for this: Tiny embedded devices with only limited RAM (so a few buffers of 100s of bytes are OK, but not 32K), but with access to a lot larger ROM memories (e.g. internal or external flash, EEPROM, SDcard, ...) .

But there might be a cleaner way to implement this. Instead of introducing the extra d->dest_ext pointer etc.. It might make sense to simply implement an outbuf_put() and outbuf_get() callback function, that would then allow the user of the library to do whatever is needed to 'write' or 'read' from that output buffer. This decouples the need for RAM to buffer data, from the library. Of course, this is going to be a lot slower than reading/writing directly from RAM, but in a lot of use-cases (like the firmware upgrade case), this does not matter.

Let me know what you think, and what kind of interface you would prefer, and I can send a pull-request.

d-a-v commented 3 years ago

For the exact same purpose, I had initiated a similar job, FWIW, not proposed as pull-request. I implemented your idea through macros ALIGN_READ / ALIGN_WRITE that have default definitions for legacy behavior but that can be redefined to something else (reading from flash) when dictionary is disabled. This is a proof of concept that has been proposed nowhere, and proved to be working in an emulation environment only.

This functionality you propose would be indeed very useful on small targets.

This idea was abandoned because the esp8266 arduino OTA second stage happens at reboot time when there's still plenty/enough of temporary ram to be able to uncompress and reflash the FW. I'd like to thanks once again @pfalcon for this library. Credits are already given in the esp8266 arduino main readme.

mikevoyt commented 3 years ago

It's been a while, but I believe this Pull Request accomplishes what you are looking for - "deflate a stream without a memory buffer". It allows for reading compressed input, and writing the decompressed output, a single byte at a time.

It's not a very clean PR, but I did put this into a production device for reading a compressed factory firmware image from SPI flash, and uncompressing it a word at a time to write to internal program flash, and it's been working great for 4+ years. I should probably clean up the PR and re-submit it at some point.

d-a-v commented 3 years ago

Apparently @mikevoyt your idea is welcome by owner but further work seems to be required for your PR to be accepted.

pfalcon commented 3 years ago

There was a change which I had in my git stash for quite some time, and which I now cleaned up and pushed: https://github.com/pfalcon/uzlib/commit/2aeab6c76bd385f72998780296d9d928c893a732. If you look at it carefully, you'll notice that what it would take to support "copy LZ strings from a backing store [instead of memory]" - is to allow to override memcpy there with a user-defined function.

If someone would like to prepare a patch, please follow https://github.com/pfalcon/change-control-manifesto . Thanks.

pfalcon commented 3 years ago

Oh, and I would like to remind that zlib supports configurable dictionary size, so it doesn't have to be "32KB". Powers of 2 from 512 to 32KB are supported. As an example, Pycopy packages use 4K dictionary: https://github.com/pfalcon/pycopy-lib/blob/master/sdist_upip.py#L34

maximevince commented 3 years ago

Nice, I just rebased on top of your new commits. Can you have a look at my proposed pull-request #33 ?

github-actions[bot] commented 3 years ago

Thanks for your submission. However there was no (further) activity on it for some time. Below are possible reasons and/or means to proceed further:

Please make sure that your submission contains clear and complete description of the issue and information required to understand and reproduce it (exact version of all components involved, etc.). Please refer to numerous guides on submitting helpful bugreports, e.g. https://www.chiark.greenend.org.uk/~sgtatham/bugs.html
Please make sure that your feature request/suggestion/change aligns well with the project purpose, goals, and process.
If you face the issue you report while working on a community-oriented open-source project, feel free to provide additional context and information - that may help to prioritize the issue better.
As many open-source projects, this project is severely under-staffed and under-resourced. If you don't run a community-oriented open-source project, and would like to get additional support/maintenance, feel free to request a support contract.

Thanks for your understanding!

github-actions[bot] commented 3 years ago

Closing due to inactivity.

pfalcon / uzlib

Decompression of a stream in output chunks without dictionary #32