snakehand / heatshrink

Minimal no_std implementation of Heatshrink compression & decompression
Other
9 stars 2 forks source link

Compression of larger files takes really long #5

Open Krensi opened 1 year ago

Krensi commented 1 year ago

Hi!

I have issues compressing larger files (400kB and upwards). Again, I compared the results with the C-library, and even a ~800kB file compresses just fine. I used the test files below (some e-book files), and I can observe that compression time increases a lot. Can you observe this behavior too?

Thank you again and greetings, Christian

text_file_1_medium.txt text_file_2.txt text_file_1.txt

snakehand commented 1 year ago

The implementation is really minimal, and is quite stupid when it comes to dictionary searching. It basically does a linear search in the window, so as the file and window size grows, it is expected that compression times will increase. Speeding it up would be easy with dictionaries or other data structures that use dynamic memory allocation, but that is not an option in a no_std crate. I can however try to try to think of an option to speed up compression if the user will provide a buffer that can be used for holding lookup related data. The initial use case was to compress small messages on a radio link, not several 100ks of text :-)

Krensi commented 1 year ago

Ah, I see. The clib requires an internal buffer with a size which is related to the window size (see snippet from file below).

typedef struct {
    uint16_t input_size;        /* bytes in input buffer */
    uint16_t input_index;       /* offset to next unprocessed input byte */
    uint16_t output_count;      /* how many bytes to output */
    uint16_t output_index;      /* index for bytes to output */
    uint16_t head_index;        /* head of window buffer */
    uint8_t state;              /* current state machine node */
    uint8_t current_byte;       /* current byte of input */
    uint8_t bit_index;          /* current bit index */

#if HEATSHRINK_DYNAMIC_ALLOC
    /* Fields that are only used if dynamically allocated. */
    uint8_t window_sz2;         /* window buffer bits */
    uint8_t lookahead_sz2;      /* lookahead bits */
    uint16_t input_buffer_size; /* input buffer size */

    /* Input buffer, then expansion window buffer */
    uint8_t buffers[];
#else
    /* Input buffer, then expansion window buffer */
    uint8_t buffers[(1 << HEATSHRINK_DECODER_WINDOW_BITS(_))
        + HEATSHRINK_DECODER_INPUT_BUFFER_SIZE(_)];
#endif
} heatshrink_decoder;

So maybe this internal buffer enables the higher processing speed. Maybe I can come up with a pull request, which provides the possibility for an optional buffer or something. This way I could finally remove the dependency to the heatshrink clib in my rust code. Unfortunately, I need to process files up to 500kB because the original size of the firmware which shall be compressed is in this range.

Thanks for your reply and explanation :-)

jcdubois commented 1 year ago

I believe the C implementation provides an optional search_index that is speeding the compression quite a bit (it does not apply to the uncompress side).

See https://spin.atomicobject.com/2014/01/13/lightweight-indexing-for-embedded-systems/ for details.

This is enabled with HEATSHRINK_USE_INDEX in the C version.

I guess you could try to disable the search index in the C version (by unsetting HEATSHRINK_USE_INDEX in the heatshrink config file) and compare the performance you get then with the performance of the RUST version. It should then be similar as the rust version is missing the search index feature.