pfalcon / uzlib

Radically unbloated DEFLATE/zlib/gzip compression/decompression library. Can decompress any gzip/zlib data, and offers simplified compressor which produces gzip-compatible output, while requiring much less resources (and providing less compression ratio of course).
Other
303 stars 82 forks source link

tinflate/crc32/adler32: add external destination buffer support #33

Closed maximevince closed 3 years ago

maximevince commented 3 years ago

Following #32 I prepared following - rather small - code changes.

Goal: Allow an external buffer (like MCU flash memory, an SDcard, SPI flash, ...) to be used as destination for decompression, while still just requiring minimal changes to the existing code.

Approach: Two new macros are introduced: TINF_DEST_PUTC(c) and TINF_DEST_GETC(addr) When not defined, they will just revert to the exact same code as currently used in uzlib. When override, they allow to remap the read(getc) + write (putc) actions to be override however is appropriate for the project.

I just looked at the newly pushed memcpy approach as well ( https://github.com/pfalcon/uzlib/commit/2aeab6c76bd385f72998780296d9d928c893a732 ), but it does not seem to solve the issue quite in the same way. I believe both solutions can exist next to each other.

My target use case is a small MCU with only small amount of RAM, in which case I do not want to use a dictionary, and preferably don't have to override memcpy, since performance is not important, but rather a small memory footprint is (RAM- and ROM-wise)

Let me know what you think, and any adjustments you would require!

Cheers, and thanks again for the project!

pfalcon commented 3 years ago

Thanks for the patch, but:

Goal: Allow an external buffer (like MCU flash memory, an SDcard, SPI flash, ...) to be used as destination for decompression,

After looking at this stated goal and the patch, I'm still not sure what's the purpose of this patch. So, let's start from the beginning: What would you like to achieve? How would you like to achieve that? Why did you chose that way to achieve it? Please pay special attention to http://xyproblem.info/ .

in which case I do not want to use a dictionary,

DEFLATE algorithm is based on the concept of a "sliding dictionary", so in this sense, it's impossible to not use it. The sooner this point is clear, the sooner it will be possible to consider different ways the dictionary can be stored and accessed, and add useful missing ways to uzlib.

since performance is not important

For uzlib performance is important, in a sense that if 2 solutions are available, mostly equivalent in implementation complexity (on the uzlib side), with one guaranteedly having a bottleneck re: performance, and another which doesn't, the one which doesn't should be preferred.

maximevince commented 3 years ago

Ok, you are right, I have wrongly used "dictionary" here. I do realize that a dictionary is needed, and what I meant is that I cannot afford it, RAM-wise, to use an external dictionary. Therefore, I want to use the "sliding dictionary", i.e. backward look-up in the already written output data.

Next, I believe that the original problem I am trying to solve with this patch is: Since the target device has very little RAM, the goal is to minimize RAM requirements and still allow to decompress an arbitrary length compressed stream.

Applied to my use case: "firmware upgrade with a compressed payload", this means:

Since I want to be able to use the sliding dictionary (less RAM usage than an external dictionary), and I cannot write to flash directly, this means some RAM buffer is involved when decompressing data. I want to buffer the decompressed data (only the most recently decompressed bytes) in this output RAM buffer, then write chunks of it to flash. (e.g. per page, or per few words).

Since uzlib uses things like d->dest[0] = d->dest[d->lzOff]; this does not work in my use-case, since either:

My complete solution is therefore to have a put and get macro, so that I can handle the spread between RAM and flash in my end application and have level of indirection to point the 'put' and 'get' to the actual memory address needed (which is sometimes flash and sometimes RAM). This code is not part of the PR, since I wanted to keep the PR as small as possible. What my code is doing there is:

The RAM buffer in between is needed, since I cannot simply write every new byte to flash directly, in which case these new wrapping macros would not be needed.

A note on performance: I do understand performance is important. That is also why the solution proposed does not impact performance for any of the originally supported use-cases. The new macros just resolve to exactly the same code as before. It merely allows to 'hook' the get and put calls, in case this is needed. In that case, depending on the way one implements these macros, a performance penalty might apply.

I hope this explains my situation and the solution provided more clearly.

Cheers

pfalcon commented 3 years ago

I cannot afford it, RAM-wise, to use an external dictionary.

You probably mean that you can't afford to use explicit dictionary ringbuffer, uzlib_uncomp::dict_ring: https://github.com/pfalcon/uzlib/blob/master/src/uzlib.h#L116

decompress byte-by-byte

uzlib can decompress arbitrary amount of data from a compressed stream, starting from 1 byte and up.

The decompressed data is ~100 to 200 kB, which I cannot fit in RAM.

You don't need. You decompress a page size of data (or any other arbitrary amount), write it, and repeat with the rest.

Since uzlib uses things like d->dest[0] = d->dest[d->lzOff];

As was mentioned in https://github.com/pfalcon/uzlib/issues/32#issuecomment-693684733, since https://github.com/pfalcon/uzlib/commit/2aeab6c76bd385f72998780296d9d928c893a732 there's now an option to not use that for accessing a dictionary. Currently the alternative is to call a function named "memcpy", but the name definitely can be made configurable. I'd suggest to follow that lead, that's the way forward for uzlib.

maximevince commented 3 years ago

I cannot afford it, RAM-wise, to use an external dictionary.

You probably mean that you can't afford to use explicit dictionary ringbuffer, uzlib_uncomp::dict_ring: https://github.com/pfalcon/uzlib/blob/master/src/uzlib.h#L116

Exactly

decompress byte-by-byte

uzlib can decompress arbitrary amount of data from a compressed stream, starting from 1 byte and up.

Yes, decompressing byte-per-byte is exactly what I am doing now.

The decompressed data is ~100 to 200 kB, which I cannot fit in RAM.

You don't need. You decompress a page size of data (or any other arbitrary amount), write it, and repeat with the rest. I am not sure how to "repeat" after decompressing a page's worth of data into RAM.

Since uzlib uses things like d->dest[0] = d->dest[d->lzOff];

As was mentioned in #32 (comment), since 2aeab6c there's now an option to not use that for accessing a dictionary. Currently the alternative is to call a function named "memcpy", but the name definitely can be made configurable. I'd suggest to follow that lead, that's the way forward for uzlib.

This sounds like exactly what I need, since the dictionary access is my biggest issue. I do wonder how this works for crc32 / adler, since there are similar, direct accesses into buf[x] which could be data not written to flash yet? However, [2aeab6c] is implemented in a different way than what I suggest here, and I am not really fond of overriding standard C-lib calls like memcpy. Obviously, the rest of my application would still need a "normal" memcpy, and this is why an overridable macro is cleaner, IMO.

pfalcon commented 3 years ago

I am not sure how to "repeat" after decompressing a page's worth of data into RAM.

See the example included with the library: https://github.com/pfalcon/uzlib/blob/master/examples/tgunzip/tgunzip.c#L41

how this works for crc32 / adler

They are transparently handled by uzlib_uncompress_chksum().

overriding standard C-lib calls like memcpy. Obviously, the rest of my application would still need a "normal" memcpy

That's why nobody talks about overriding memcpy() for the rest of your application.

pauloet commented 3 years ago

Hi there! First of all, thank you @pfalcon, all the team and contributors for this great work!

In my case, I've also memory limitations since I'm using a small micro-controller. I've attached an example of my application in which I'm not being able to get the second chunk correctly. I'm using the master branch.

My compressed file has 1056 bytes and the decompressed has 2007 bytes. I want to get chunks of 1024 and write them to an SD Card. My output is:

Read 1056 bytes Header parsed chunk_len: 1024. uzlibRes: 0 Wrote chunk 0 chunk_len: 983. uzlibRes: -3 Error during decompression: -3. i: 1.

In my case, I do not need to define a source_read_callbak since all data is in RAM. That's a future problem, but for now I'm focused only in the output data. Can help me here please?

Moreover, I know where it returns the DATA_ERROR. It is in function tinf_inflate_block_data here:

} else {
            /* catch trying to point before the start of dest buffer */
            if (offs > d->dest - d->destStart) {
                return TINF_DATA_ERROR;
            }
            d->lzOff = -offs;
        }

uzlib_out_chunck.txt

Delsian commented 3 years ago

@maximevince Thank you for this improvement, I've used your commit 3155321a82ab6d56cea503dc7933e4dbd8928f62 in my fork of uzlib together with Zephyr OS. It works as expected. But from my point of view, we should use context (TINF_DATA *d) as a parameter in TINF_DEST_PUTC and TINF_DEST_GETC for easy porting. My current implementation for these functions uses TINF context as static and looks like:

static struct uzlib_uncomp d_context;
static upgrade_context_t* uct = NULL;

void uzlib_dest_putc(uint8_t c) {
    if(d_context.dest >= uct->flash_write_ptr+FLASH_BLOCK_SIZE) {
        //LOG_DBG("Block ready: %p", uct->flash_write_ptr);
        flash_area_write(uct->upload_area, (off_t) uct->flash_write_ptr,
                    uct->unpack_buf, FLASH_BLOCK_SIZE);     
        uct->flash_write_ptr += FLASH_BLOCK_SIZE;
    }

    uct->unpack_buf[(uint32_t)(d_context.dest - uct->flash_write_ptr)] = c;
}

uint8_t uzlib_dest_getc(const uint8_t *addr) {
    if(addr < uct->flash_write_ptr) {
        uint8_t rv;
        flash_area_read(uct->upload_area, (off_t) addr, &rv, 1);
        return rv;
    }
    return uct->unpack_buf[(uint32_t)(addr - uct->flash_write_ptr)];
}

@pfalcon This PR pretty useful, could you please merge it into master branch?

pfalcon commented 3 years ago

@Delsian: Useful for what? Please see the previous discussion and see how it fits with it.

github-actions[bot] commented 3 years ago

Thanks for your submission. However there was no (further) activity on it for some time. Below are possible reasons and/or means to proceed further:

Thanks for your understanding!

github-actions[bot] commented 3 years ago

Closing due to inactivity.