rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
929 stars 163 forks source link

Add functionality for custom (de)compress instances #361

Closed PierreV23 closed 1 year ago

PierreV23 commented 1 year ago

Adding these functions make it easier to use custom Decompress and Compress instances. Sometimes these are needed to change the window_bits or zlib_header, without these functions its kinda painful to implement it.

added functions:

bufread

read

write

PierreV23 commented 1 year ago

Thanks a lot for contributing!

I also see how these new methods would add to an overall more consistent API. After making adjustments to the doc-strings I'd hope for another untainted pair of eyes for merging though.

Yea this is my first PR and attempt at making docs :/

Anyway something I should note. (De)Compress both have a function new_gzip, which feels kinda weird passing to a new_... function of Zlib(De)Encoder...

I dont know if passing such instance would break it, I haven't tested it.

Byron commented 1 year ago

Thanks for pointing this out! Yes, maybe a word of caution in the docs might be appropriate. But then again, those who need full control probably know why they need it in the first place.

Rigidity commented 1 year ago

Currently it doesn't seem possible to set a custom dictionary unless you use the raw Compress/Decompress. But then you don't get access to the private zio module functionality. This PR would enable this

Byron commented 1 year ago

I have given it another review to improve internal consistency, and would think it's ready for merge. Since the change involves a public API, I'd really love another review, let's see what I can do about that.

Byron commented 1 year ago

@jongiddy Maybe you have thoughts on this PR? I'd appreciate your opinion. Thanks for considering.

jongiddy commented 1 year ago

I do not think we need new_with_decompress_and_buf. Anyone who needs to control the buffer size should create a BufReader and use bufread::ZlibDecoder directly.

I'm not sure that we need to control the zlib_header. This is an internal flag to allow sharing the implementations for Deflate and Zlib formats. I guess it helps a little if the caller has a similar boolean flag.

That only leaves window_bits. There might be a case for exposing the new_with_window_bits from the mem types, but this should be done consistently in deflate and zlib.

However, given that the mem Compress and Decompress are public already, doing it this way is OK. It provides control over window_bits and provides a way to pass the zlib_header flag directly. My only change would be to remove the new_with_decompress_and_buf method.

Byron commented 1 year ago

Thanks a lot, @jongiddy! I have removed the new_with_decompress_and_buf method as suggested.

This also brings us to the point where we can consider to at least deprecate all the existing new_with_buf() methods while documenting how these should be used instead - they seem redundant after all, but I can imagine how they have been added for convenience. Probably though, it's best to leave these be, while admitting that new_with_decompress is probably the 'expert mode' of constructors, which is used rarely and won't needed such buf related conveniences.

In any case, I think this PR can be merged now.

Byron commented 1 year ago

Oh, never mind, I think after breaking CI in the most sloppy of ways, it became clear why with_buf variants exist. new will always create a BufRead, and controlling the buffer is only possible by having a separate constructor. It would be possible to let new_with_decompress() take r without wrapping it into BufRead, but that would be quite a departure form the way the API works.

So unless the API is changed to not wrap incoming Read impls into BufRead, I think the way it works now (without my change) is the way to go after all.

Byron commented 1 year ago

I am merging the previous state, but please let me know in case I am missing something so we can apply a fix in a follow-up.

jongiddy commented 1 year ago

Not sure how removing a function that is not used by a test could break CI?

None of the with_buf methods should need to be used as long as the caller does both of:

  1. Wrap a non-BufRead reader in a BufReader; and
  2. Change from the read module to the bufread module for the compress and decompress types.
Byron commented 1 year ago

Oh, never mind, I think after breaking CI in the most sloppy of ways,

Not sure how removing a function that is not used by a test could break CI?

CI breakage was unrelated, or related to me being sloppy.

Related to with_buf(), I meant to say that any change to these would mean deprecating them, yet I thought they are probably there as convenience and thus for a good-enough reason.

However, since the …with_buf() moniker is now causing additional effort as it this PR had to add another method for symmetry, it might be worth rethinking them, while acknowledging that this will affect users in various ways which is why I shied back from a change in the first place.

If you feel it's worth pursuing such a change, I invite you to open a draft PR to sketch it out (which I'd always prefer over an issue which is more abstract).

jongiddy commented 1 year ago

While the with_buf methods are redundant, I have no desire to deprecate them. My suggestion was to not add the new read::ZlibDecoder::new_with_decompress_and_buf(r, d, vec![0; 1024]) method since the same behaviour can be achieved using bufread::ZlibDecoder::new_with_decompress(BufReader::with_buf(r, vec![0; 1024]), d).

My concern above was that removing this new function, that isn't used anywhere yet, broke CI. But it appears that was not the case.

Byron commented 1 year ago

While the with_buf methods are redundant, I have no desire to deprecate them. My suggestion was to not add the new read::ZlibDecoder::new_with_decompress_and_buf(r, d, vec![0; 1024]) method since the same behaviour can be achieved using bufread::ZlibDecoder::new_with_decompress(BufReader::with_buf(r, vec![0; 1024]), d).

I actually experimented with this which is why I realized it's for the better to keep this PR as is for a consistent API. Would the with_buf methods not exist, it would be clear that new_with_decompress_and_buf would also not exist and for the better. What I wanted to avoid is to have this one 'expert' method that requires the caller to add the BufRead which this implementations currently enforces, causing surprises in performance. This led me to believe that the authors of this type must have believed that having a BufRead is so paramount that there should be no way to even accidentally not have it. I hope this makes sense.