rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
869 stars 159 forks source link

Decompress Example #312

Closed EricFecteau closed 10 months ago

EricFecteau commented 1 year ago

Receiving zlib raw stream data from python with the "zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -zlib.MAX_WBITS)" options is not rare, but there are no examples of how to use the "Decompress" object in the example folder and there is no mention of the "Decompress::new_with_window_bits" in the documentation. From my understanding, there is no other way to inflate the below example with this library (since the second "message" is dependent on the first).

Could documentation on the "new_with_window_bits" be added and maybe an example (similar to the one below) be added to the examples?

Python Code for Generating example:

import zlib

compressobj = zlib.compressobj(
    zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -zlib.MAX_WBITS
)

message = b'{"msgs":[{"msg": "ping"}]}'
compressed = compressobj.compress(message)
compressed += compressobj.flush(zlib.Z_SYNC_FLUSH)
compressed = compressed[:-4]
print([c for c in compressed])

message = b'{"msgs":[{"msg": "lobby_clear"},{"msg": "lobby_complete"}]}'
compressed = compressobj.compress(message)
compressed += compressobj.flush(zlib.Z_SYNC_FLUSH)
compressed = compressed[:-4]
print([c for c in compressed])

Rust Code for inflating the above:

use flate2::{Decompress, FlushDecompress};
use std::str;

fn main() {
    // Python ZLIB compressed with options: zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -zlib.MAX_WBITS

    // b'{"msgs":[{"msg": "ping"}]}'
    let msg_vec1 = vec![
        170, 86, 202, 45, 78, 47, 86, 178, 138, 174, 6, 49, 148, 172, 20, 148, 10, 50, 243, 210,
        149, 106, 99, 107, 1, 0, 0, 0, 255, 255,
    ];

    // b'{"msgs":[{"msg": "lobby_clear"},{"msg": "lobby_complete"}]}'
    let msg_vec2 = vec![
        170, 198, 144, 201, 201, 79, 74, 170, 140, 79, 206, 73, 77, 44, 82, 170, 213, 65, 23, 206,
        207, 45, 200, 73, 45, 73, 5, 105, 5, 0,
    ];

    let wbits = 15; // Windows bits (goes to -15 in flate2 because of zlib_header = false)
    let bufsize = 32 * 1024;

    let mut decompressor = Decompress::new_with_window_bits(false, wbits);
    let mut decoded_bytes = Vec::with_capacity(bufsize); // with_capacity mandatory, or else err "invalid distance too far back"

    decompressor
        .decompress_vec(&msg_vec1[..], &mut decoded_bytes, FlushDecompress::Finish)
        .expect("Failed to decompress");

    println!("{:?}", str::from_utf8(&decoded_bytes).expect("Bad UTF8"));

    let mut decoded_bytes = Vec::with_capacity(bufsize);
    decompressor
        .decompress_vec(&msg_vec2[..], &mut decoded_bytes, FlushDecompress::Finish)
        .expect("Failed to decompress");

    println!("{:?}", str::from_utf8(&decoded_bytes).expect("Bad UTF8"));
}

Cargo.toml must include the following:

flate2 = { version = "1", features = ["zlib-ng"], default-features = false }
PierreV23 commented 11 months ago

Before PR #361 you had to manually write a decompresser if you wanted to use a custom Decompress object (which is needed if you want to specify the header or window_bits values). Writing a custom decompresser would kind look like this: https://github.com/bend-n/mindus/blob/master/src/data/mod.rs#L190 (line 190 should point to a function called deflate).

After the PR of #361 decompressing (using read::ZlibDecoder) is as simple as:

let mut decompresser = ZlibDecoder::new_with_decompress(
    compressed,
    Decompress::new_with_window_bits(false, 15),
);
let mut decompressed = String::new();
decompresser.read_to_string(&mut decompressed)?;

Even tho this Issue is kinda old by now, I still hope it might help you or others that come across this.

Byron commented 11 months ago

@EricFecteau Would this issue be fixed now that code like in the example above would work with a new release? It seems like it to me but I might be missing something. Thanks you.

EricFecteau commented 11 months ago

The solution above works for the first msg_vec1 from the first message, but how do I add in msg_vec2? In the example in the first message, the decompressor is created separately and then decompress_vec can be called multiple times with new messages. msg_vec2 depends on msg_vec1, and therefore I can't simply call msg_vec2 the same way. How would I do this using ZlibDecoder::new_with_decompress?

PierreV23 commented 10 months ago

The solution above works for the first msg_vec1 from the first message, but how do I add in msg_vec2? In the example in the first message, the decompressor is created separately and then decompress_vec can be called multiple times with new messages. msg_vec2 depends on msg_vec1, and therefore I can't simply call msg_vec2 the same way. How would I do this using ZlibDecoder::new_with_decompress?

https://docs.rs/flate2/latest/flate2/read/struct.ZlibDecoder.html#method.reset you can use ZlibDecoder::reset to reset the decoder and resupply with a new input stream.

EricFecteau commented 10 months ago

reset completely resets the decoder and gives me corrupt deflate stream error, as I would expect, since msg_vec2 is dependent on msg_vec1. I might be missing something obvious, but in the original post, I can provide multiple vectors to the decompression object one after the other with (see msg_vec1 and msg_vec2):

    decompressor
        .decompress_vec(&msg_vec1[..], &mut decoded_bytes, FlushDecompress::Finish)
        .expect("Failed to decompress");

    println!("{:?}", str::from_utf8(&decoded_bytes).expect("Bad UTF8"));

    let mut decoded_bytes = Vec::with_capacity(bufsize);
    decompressor
        .decompress_vec(&msg_vec2[..], &mut decoded_bytes, FlushDecompress::Finish)
        .expect("Failed to decompress");

    println!("{:?}", str::from_utf8(&decoded_bytes).expect("Bad UTF8"));

With the code you provided, how do I provide a second compressed vector to the decompressor?

PierreV23 commented 10 months ago

Honestly, I don't really know either, I had assumed reset would work, but I didn't realise your 2nd vec was dependant on the 1st. Which I find quite odd by the way.

Here is something that would work, but is likely not ideal:


fn main() {
    // Python ZLIB compressed with options: zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -zlib.MAX_WBITS

    // b'{"msgs":[{"msg": "ping"}]}'
    let msg_vec1 = vec![
        170, 86, 202, 45, 78, 47, 86, 178, 138, 174, 6, 49, 148, 172, 20, 148, 10, 50, 243, 210,
        149, 106, 99, 107, 1, 0, 0, 0, 255, 255,
    ];

    // b'{"msgs":[{"msg": "lobby_clear"},{"msg": "lobby_complete"}]}'
    let msg_vec2 = vec![
        170, 198, 144, 201, 201, 79, 74, 170, 140, 79, 206, 73, 77, 44, 82, 170, 213, 65, 23, 206,
        207, 45, 200, 73, 45, 73, 5, 105, 5, 0,
    ];

    let mut msg_vec3 = msg_vec1.clone();
    msg_vec3.append(&mut msg_vec2.clone());

    let mut decompresser = ZlibDecoder::new_with_decompress(
        &msg_vec1[..],
        Decompress::new_with_window_bits(false, 15),
    );

    let mut msg1 = String::new();
    decompresser.read_to_string(&mut msg1).unwrap();
    println!("{}", msg1);

    decompresser = ZlibDecoder::new_with_decompress(
        &msg_vec3[..],
        Decompress::new_with_window_bits(false, 15),
    );

    let mut msg2 = String::new();
    decompresser.read_to_string(&mut msg2).unwrap();

    msg2 = msg2[msg1.len()..].to_string();

    println!("{}", msg2);
}
PierreV23 commented 10 months ago

Also I reccomend using python zlib's zlib.compress, its return can be passed as a single parameter to ZlibDecoder::new_with_decompress(...) instead of spreading your object(s) over two strings.

I took another peak at your python code, but you are supposed to make a new compressobj if you want to be able to be read seperately. (unless there are methods i am not aware of)

EricFecteau commented 10 months ago

Thanks, but this would not work either. My python example is from a program I don't have access to (so I can't modify it), and it sends the data to me through a websocket (so I can't simply append it all together as I don't know what message msg_vec3 will be until I respond to the websocket based on the info in msg_vec1 and msg_vec2). All the messages I receive are dependent on the previous ones, even if they are not yet created at the time of decoding the previous ones.

Looking around the other issues, I suspect I have the same problem as #276 -- thankfully my first post does solve this, even if it's a bit clunkier than it could be, so I will close this issue!