snakehand / heatshrink

Minimal no_std implementation of Heatshrink compression & decompression
Other
9 stars 2 forks source link

Encoding leads to other result than encoding with original c-source #1

Closed Krensi closed 1 year ago

Krensi commented 1 year ago

Hi!

I wanted to use this library for decoding heatshrink compressed data. However, I was always getting an IllegalBackref error. I decided to compare the encoding and decoding using the CLI (https://github.com/atomicobject/heatshrink) and the test data of the alpha test. So I guess something is wrong with the encoding? In both cases, a window size of 11 and a lookahead of 4 is used.

Result alpha test: image

Result using CLI with same data: image

Maybe you can tell me what I made wrong. Regards, Christian

snakehand commented 1 year ago

Does the decompress of both encoding yield the original data ? If so it could be that the string search has picked different references to identical substrings, which should not affect the correctness of the library. If there is discrepancies in the decompressed data I will have to investigate.

Krensi commented 1 year ago

That's the result of test::alpha:

running 1 test
Encoded: 90d4b2b549a40a00001e001f00c9811b7ca05f1817c002da5f04025f0005
Decoded: 215295543402000000000000000000000000000000000000000000000000000000000000000000009302000000000000f202f102f0020000000000002f0400000000000000000000000000000000000000000000
test test::alpha ... ok

Decoding 90d4b2b549a40a00001e001f00c9811b7ca05f1817c002da5f04025f0005 using the C-Code leads to 215295543402000000000000000000000000000000000000000000000000000000000000000000009302000000000000F202F102F0020000000000002F0400000000000000000000000000000000000000000000 which is actually the same result

Here is my nushell output

heatshrink on  master [?] via C v12.2.0-gcc
❯ : (heatshrink -w 11 -l 4 -d in_compressed_dp.bin | into binary) == (0x[215295543402000000000000000000000000000000000000000000000000000000000000000000009302000000000000f202f102f0020000000000002f0400000000000000000000000000000000000000000000] | into binary)
true

When I encrypt 215295543402000000000000000000000000000000000000000000000000000000000000000000009302000000000000F202F102F0020000000000002F0400000000000000000000000000000000000000000000 using the C-Code the result is 90D4B2B549A408057C003E0100C9811B7CA05F1817C002DA5F04025F0005.

Trying to decode this result with this library leads to a illegal backref error. The code of the test is below and the a screenshot of the result as well.

    #[test]
    fn decode() {
        let src = hex_literal::hex!("90D4B2B549A408057C003E0100C9811B7CA05F1817C002DA5F04025F0005");
        let cfg = Config::new(11, 4).unwrap();
        let mut dst1 = [0; 100];

        decoder::decode(&src, &mut dst1, &cfg).unwrap();
    }

image

Greetings, Christian

snakehand commented 1 year ago

I found the problem. The C version fills a window buffer with 0 prior to compression, and allows searching in the full window. This means that the first run of 0s where referenced from before the start of input data. I have removed the IllegalBackref error and made it fetch 0 bytes instead to maintain compatibility. I have also added unit tests + fuzzing to verify this fix. Thanks for reporting this issue. I will close it once I have done some more testing and pushed an updated version.

snakehand commented 1 year ago

Fixed with release of version 0.2

Krensi commented 1 year ago

Thank you for the quick fix! :-)

snakehand commented 1 year ago

You are welcome, and thank you for taking time to report the issue

I am curious what you are using heatshrink for? I gather it is for a WASM target ? I think this is interesting, as I originally wrote it for some embedded robotics applications, and so I would like to hear a little more about how you use it if you are able to share any info.

Cheers, Frank

On Sun, Sep 10, 2023 at 9:01 PM ckrenslehner @.***> wrote:

Thank you for the quick fix! :-)

— Reply to this email directly, view it on GitHub https://github.com/snakehand/heatshrink/issues/1#issuecomment-1712912146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDNWZEAQ7BA5RFAUMQT5DXZYFBRANCNFSM6AAAAAA4C6PGOE . You are receiving this because you modified the open/close state.Message ID: @.***>

Krensi commented 1 year ago

We have some IoT devices in my company, and we basically use heatshrink for OTA firmware update file compression. We generate diff files using the jdiff algorithm and apply heatshrink afterward, which leads to an additional size reduction of 20% to 30%. Recently, I was developing a new protocol for transmitting data from the device to our server. I am a big fan of google protobuf and also wanted to see if heatshrink could lead to additional improvements in data size. Indeed, depending on the data entropy, it does sometimes. I am a firmware developer, and I use the original C-Code of heatshrink in our firmware. However, I like to have tools for testing, which is the reason why I always write my parsers (recently in rust). To provide the parser functionality to our node based web backend, I used wasmpack and compiled my parser to wasm as you already guessed. :-) This is how I stumbled upon this issue. To fix my use case I decided to use the C-Code which was not as easy as expected, because there are dependencies to and so on. This, apparently, is not possible using wasmpack and wasm-unknown-unknown so I manually removed the dependencies to the standard library. But now I can use your lib instead, which I prefer. :-D So if you could provide me some guidance how you would possibly manage compiling a C codebase with such dependencies to wasm, I would be glad. As already mentioned we use jdiff https://jojodiff.sourceforge.net/, and I am eager to either port it to rust, or include it in a sys-crate somehow.

Greetings, Christian

huming2207 commented 1 year ago

I am a firmware developer, and I use the original C-Code of heatshrink in our firmware. However, I like to have tools for testing, which is the reason why I always write my parsers (recently in rust). To provide the parser functionality to our node based web backend, I used wasmpack and compiled my parser to wasm as you already guessed. :-) This is how I stumbled upon this issue. To fix my use case I decided to use the C-Code which was not as easy as expected, because there are dependencies to and so on. This, apparently, is not possible using wasmpack and wasm-unknown-unknown so I manually removed the dependencies to the standard library. But now I can use your lib instead, which I prefer. :-D

I have exactly the same reason for making my NodeJS binding. I'm also planning to make an another WebAssembly one for web browsers, that takes asset files in, bundles and creates a FAT32 image and compress it with Heatshrink, then push to an embedded target (ESP32 and RAM constrainted Linux boards) for further use.

Krensi commented 1 year ago

I am a firmware developer, and I use the original C-Code of heatshrink in our firmware. However, I like to have tools for testing, which is the reason why I always write my parsers (recently in rust). To provide the parser functionality to our node based web backend, I used wasmpack and compiled my parser to wasm as you already guessed. :-) This is how I stumbled upon this issue. To fix my use case I decided to use the C-Code which was not as easy as expected, because there are dependencies to and so on. This, apparently, is not possible using wasmpack and wasm-unknown-unknown so I manually removed the dependencies to the standard library. But now I can use your lib instead, which I prefer. :-D

I have exactly the same reason for making my NodeJS binding. I'm also planning to make an another WebAssembly one for web browsers, that takes asset files in, bundles and creates a FAT32 image and compress it with Heatshrink, then push to an embedded target (ESP32 and RAM constrainted Linux boards) for further use.

So if you can maybe provide some guidance relating to wasm and wasi I would be grateful! Or do you plan to write everything in pure rust? I discovered, that also using encrypt/decrypt https://crates.io/crates/aes-gcm is not straightforward in WASM context.

Greetings, Christian

huming2207 commented 1 year ago

So if you can maybe provide some guidance relating to wasm and wasi I would be grateful!

@Krensi maybe try Zig? I remember Zig can handle some cross-compiling and binding stuff for C code, and it supports WASM/WASI as well. You may need to implement some glue code in Zig, expose the API you want and put in to your project.

Krensi commented 1 year ago

@Krensi maybe try Zig? I remember Zig can handle some cross-compiling and binding stuff for C code, and it supports WASM/WASI as well. You may need to implement some glue code in Zig, expose the API you want and put in to your project.

I am interested in Zig anyway so maybe I will give it a try. Anyway keep me updated about your wasm project if you plan to release it on GitHub. Sounds interesting! I am looking for projects to contribute because it is a great way of learning rust.

Krensi commented 1 year ago

-- Off topic --

https://wiki.seeedstudio.com/SeeedStudio_XIAO_Series_Introduction/

These are great! I can really recommend them