rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
862 stars 159 forks source link

Add `continue_with` function to ZlibDecoder that allows refreshing of the stream without resetting the bufreader and fix resetting of ZlibDecoder #376

Closed PierreV23 closed 9 months ago

PierreV23 commented 10 months ago

Examples:

use flate2::{Decompress, read::ZlibDecoder};
use std::io::prelude::Read;

fn continue_with_example() {
    // Python ZLIB compressed with options: zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -zlib.MAX_WBITS

    // b'{"msgs":[{"msg": "ping"}]}'
    let msg_vec1 = vec![
        170, 86, 202, 45, 78, 47, 86, 178, 138, 174, 6, 49, 148, 172, 20, 148, 10, 50, 243, 210,
        149, 106, 99, 107, 1, 0, 0, 0, 255, 255,
    ];

    // b'{"msgs":[{"msg": "lobby_clear"},{"msg": "lobby_complete"}]}'
    let msg_vec2 = vec![
        170, 198, 144, 201, 201, 79, 74, 170, 140, 79, 206, 73, 77, 44, 82, 170, 213, 65, 23, 206,
        207, 45, 200, 73, 45, 73, 5, 105, 5, 0,
    ];

    let mut decompresser = ZlibDecoder::new_with_decompress(
        &msg_vec1[..],
        Decompress::new_with_window_bits(false, 15),
    );

    let mut msg1 = String::new();
    decompresser.read_to_string(&mut msg1).unwrap();
    println!("{}", msg1);

    decompresser.continue_read(&msg_vec2[..]);

    let mut msg2 = String::new();
    decompresser.read_to_string(&mut msg2).unwrap();
    println!("{}", msg2);
}

fn reset_example() {
    // b'{"msgs":[{"msg": "ping"}]}'
    let msg_vec1 = vec![
        170, 86, 202, 45, 78, 47, 86, 178, 138, 174, 6, 49, 148, 172, 20, 148, 10, 50, 243, 210,
        149, 106, 99, 107, 1, 0, 0, 0, 255, 255,
    ];

    let mut decompresser = ZlibDecoder::new_with_decompress(
        &msg_vec1[..],
        Decompress::new_with_window_bits(false, 15),
    );

    let mut msg1 = String::new();
    decompresser.read_to_string(&mut msg1).unwrap();
    println!("{}", msg1);

    decompresser.reset_on_custom(&msg_vec1[..], false);

    let mut msg2 = String::new();
    decompresser.read_to_string(&mut msg2).unwrap();
    println!("{}", msg2);
}

This should help #312 and #276

NOTE: This does not affect read&write ZlibEncoder and write ZlibDecoder. These are yet to be written.

PierreV23 commented 10 months ago

I edited the the pull request. (I didn't mean to pull both continue_with and the reset fix at the same time, im a newbie...)

Yes, I agree a API change might be needed, but I was afraid of changing too much and I didn't want to change already existing code.

PierreV23 commented 10 months ago

The biggest issue at the moment is the fact that ZlibDecoder::reset uses bufread::reset_decoder_data.

pub fn reset_decoder_data<R>(zlib: &mut ZlibDecoder<R>) {
    zlib.data = Decompress::new(true);
}

It recreates an entirely new Decompress instance without taking into account possible custom parameters (zlib_header and window_bits). This causes deflate errors. reset_with_custom can be seen as a replacement.

continue_with allows for a refreshment of the stream, without touching the Decompress instance

PierreV23 commented 10 months ago

Thanks for the examples, they are certainly a straightforward way of solving this particular issue. Due to the cost in maintenance that this additional API surface would incur, I'd like to investigate alternatives.

Can you instead try to create a Read implementation that allows to change an internal reader? Think along the lines of struct Foo(Rc<RefCell<impl std::io::Read>>). Then one can clone Foo to pass it as reader to the decompressor, but also call foo.reset(new_reader) on it. This has the same effect as this API, but is entirely under control of the user.

If properly implemented, tested and documented, I think it could start out as part of an example, similar to the one you already provide.

Thanks for your understanding and for considering continuing this PR despite the change in direction.

So if I understood you correctly, a new implementation of Read gets passed to bufreader::ZlibDecoder. Where you can directly manipulate the new Read instead manipulating it via the ZlibDecoder?

Byron commented 9 months ago

Yes. You can implement it as part of an official example/*.rs which could be what you presented here, but solving it with a Read wrapper that allows to swap out the inner Read instance.

PierreV23 commented 9 months ago

That would mean we would not need a reset function for any of the top layer wrappers right? I was thinking of how to remove some of the boilerplate I came by. What if we just make Config struct(s) that are mandatorily passed to ::new? This would also mean we could abandon the separate gz and zlib directories, generalizing them into a single Decoder and Encoder.

We can make default instances like ZLIB_DEFAULT and GZ_DEFAULT, or use functions: Config::zlib_default() and Config::gz_default().

I think there is a lot of code that can practically be discarded, with the result of creating a new interface that is prettier, more implicit and easier to use.

It would mean a move to a new major version though.

Update: I read more into the source, but GZ is more unique than i thought. I still think the codebase could use improvement.

Byron commented 9 months ago

It's great that you are taking the time to get acquainted, this will definitely be helpful!

That would mean we would not need a reset function for any of the top layer wrappers right?

I don't know what that means, but recommend trying to do as described by me earlier. There is no need to adjust anything in this crate to get your example to work.

It would mean a move to a new major version though.

Any breaking change could be explored in an own crate, and maybe from there one could conclude that it's worth making such a change here as well. This needs a lot of experimentation, time and discussion though to assure it's the right move, it has great implications.

I think to get this PR merged it has to pivot towards a utility type and an example as described earlier, and I urge to stay on topic or close this PR and work on a new crate that shows how to do better than what you find here. Thanks for your understanding.

PierreV23 commented 9 months ago

I think I will drop working on this now then. I am not experienced enough with Read to implement such things and I'd rather work on something else (i.e. generalized / config system).

I will take a break for now and look back once I have enough spare time again.

Good luck.