Open fedy-cz opened 8 months ago
Well. I adapted one of the test from zlib:
#[test]
fn total_in() {
let mut real = Vec::new();
let mut w = write::GzEncoder::new(Vec::new(), Compression::default());
let v = crate::random_bytes().take(1024).collect::<Vec<_>>();
for _ in 0..200 {
let to_write = &v[..thread_rng().gen_range(0..v.len())];
real.extend(to_write.iter().copied());
w.write_all(to_write).unwrap();
}
let mut result = w.finish().unwrap();
let result_len = result.len();
for _ in 0..200 {
result.extend(v.iter().copied());
}
let mut r = read::GzDecoder::new(&result[..]);
let mut ret = Vec::new();
r.read_to_end(&mut ret).unwrap();
assert_eq!(ret, real);
assert_eq!(r.total_in(), result_len as u64);
}
... and sure enough it fails.
The question is if it's initial assumptions are correct. Should the total_in() report the current position in the underlying
Reader
or the compressed data-stream itself (without CRC)? The doc comment is unclear:
/// Returns the number of bytes that have been read into this compressor.
///
/// Note that not all bytes read from the underlying object may be accounted
/// for, there may still be some active buffering.
We should decide, document it and then we can write the relevant tests and adapt the code if necessary.
If we decided to indicate the BufRead
position (to make the test posted above pass as is) then GzHeaderParser
would need to store the header size somewhere (which is currently not the case).
Thanks for giving it a shot!
I would have expected that total_in
and total_out
is solely something to be forwarded from the underlying implementation, which should keep track of it all by itself, and would not expect it to be emulated by asking the input stream for its position instead.
Remembering my initial question in the related issue #381 as to why these methods are missing, then maybe this is the answer: Straightforward forwarding, for yet to be discovered reason, isn't possible or working as expected.
As I described in the note above (https://github.com/rust-lang/flate2-rs/pull/382#issuecomment-1763450646), it's not all hopeless :smile:
zlib::*
and deflate::*
represent the (relative) position increase in the input/output streamBufReader
implements the Seek
trait with it's stream_position
method, but due to some unfortunate decisions in it's design it actually depends on the seek()
metod by default and henceforth requires a mutable borrow :disappointed: (some initial attempts to try to rectify that here: https://github.com/rust-lang/rust/issues/59359#issuecomment-1763475845) . While nothing prevents you from implementing an adapter impl Read
trait and counting the data passing through yourself, it's another layer of indirection and not very convenient.DeflateDecoder
/ DeflateEncoder
to the users and let them call these functions directly, but that is probably not very clean (exposing a private member) and also less discover-able.u64
). Not sure how much these are critical.total_in
/ total_out
methods as expected (using the header / footer size data)This could be done incrementally and at least point no. 1 shouldn't be too controversial.
What do you think?
Despite being the most active right now, and truth to be told, I feel very uncomfortable to make any API decisions in this crate. Maybe it's possible for you to figure out who wrote what or maybe even most of it, and reach out to them via GitHub or email when lacking a user name or linked PR?
Otherwise I'd actually prefer to just expose the actual compressor (as answer to 1.), but I also think there was probably a good reason not to expose it.
From my own experience (at gitoxide
), I never managed to use the high-level abstractions but went right to the underlying Decompressor
instances to steer them directly. The question here would be if a higher-level API could also be used if the documentation would be improved or if certain utility functions would be added.
With that said, to be more comfortable with making such decision, I would definitely have to understand what the new API would be good for. Tests can help, and so could full-blown example programs. Thanks for your understanding.
Is this PR still active?
(simple forwarding)