sile / libflate

A Rust implementation of DEFLATE algorithm and related formats (ZLIB, GZIP)
https://docs.rs/libflate
MIT License
178 stars 35 forks source link

Support incremental writes #28

Closed sile closed 3 years ago

sile commented 5 years ago

This PR resolves #27.

codecov-io commented 5 years ago

Codecov Report

Merging #28 (8206f1b) into master (9ecb541) will increase coverage by 7.78%. The diff coverage is 96.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #28      +/-   ##
==========================================
+ Coverage   84.16%   91.94%   +7.78%     
==========================================
  Files          19       14       -5     
  Lines        2134     1353     -781     
==========================================
- Hits         1796     1244     -552     
+ Misses        338      109     -229     
Impacted Files Coverage Δ
src/deflate/encode.rs 96.61% <96.77%> (+0.25%) :arrow_up:
src/deflate/symbol.rs 93.06% <0.00%> (-0.25%) :arrow_down:
src/checksum.rs 82.35% <0.00%> (ø)
src/non_blocking/transaction.rs 98.30% <0.00%> (ø)
src/gzip.rs
src/lz77.rs
libflate_lz77/src/lib.rs
libflate_lz77/src/default.rs
src/deflate/mod.rs
src/deflate/decode.rs
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fdad237...74d65b8. Read the comment docs.

z64 commented 5 years ago

Hi @sile , thanks for looking into this :)

It seems that using this as expected actually produces invalid zlib. I've modified the test in this branch to demonstrate a round-trip failure:

    #[test]
    fn issue_27() {
        // See: https://github.com/sile/libflate/issues/27

        let writes = ["fooooooooooooooooo", "bar", "baz"];

        let mut encoder = Encoder::new(Vec::new());
        for _ in 0..2 {
            for string in writes.iter() {
                encoder.write(string.as_bytes()).expect("Write failed");
            }
            encoder.flush().expect("Flush failed");
        }
        let finished = encoder.finish().unwrap();
        println!("{:?}", finished.0);

        let mut output = String::new();
        Decoder::new(Cursor::new(finished.0)).unwrap().read_to_string(&mut output).unwrap();
        assert_eq!(output, "fooooooooooooooooobarbaz");
    }
}

If you run this as cargo test issue_27 -- --nocapture, this panics, and the finished bytes is reported to STDOUT, which illustrates the problem:

[
    11, 56, 52, 2, 0, 0, 32, 0, 56, 43, 65, 117, 39, 154, 94, 247, 242, 35, 66, 116, // flush 1
    11, 56, 52, 2, 0, 0, 32, 0, 56, 43, 65, 117, 39, 154, 94, 247, 242, 35, 66, 116, // flush 2
    5, 192, 129, 0, 0, 0, 0, 0, 144, 255, 107, 0 // finish footer
]

So it would seem that currently, this in fact resets the internal state, which is undesirable. Each successive flush should use the same zlib context, and each flush should be seperated by a boundary sequence of [0, 0, 255, 255].

I've adjusted my original example to more clearly illustrate and draw a comparison to flate2:

Code

```rust extern crate libflate; extern crate flate2; use flate2::{Compression, write::ZlibEncoder}; use libflate::zlib::Encoder; use std::io::Write; fn main() { let writes = [ "fooooooooooooooooo", "bar", "baz", ]; println!("-- libflate"); let mut encoder = Encoder::new(Vec::new()).unwrap(); for _ in 0..2 { for string in writes.iter() { encoder.write(string.as_bytes()).expect("Write failed"); } encoder.flush().expect("Flush failed"); println!("{:?}", encoder.as_inner_ref()); } let finished = encoder.finish().unwrap(); println!("{:?}", finished); println!("-- flate2"); let mut encoder = ZlibEncoder::new(Vec::new(), Compression::default()); for _ in 0..2 { for string in writes.iter() { encoder.write(string.as_bytes()).expect("Write failed"); } encoder.flush().expect("Flush failed"); println!("{:?}", encoder.get_ref()); } let finished = encoder.finish().unwrap(); println!("{:?}", finished); } ```

Output

``` -- libflate [120, 156, 11, 56, 52, 2, 0, 0, 32, 0, 56, 43, 65, 117, 39, 154, 94, 247, 242, 35, 66, 116] [120, 156, 11, 56, 52, 2, 0, 0, 32, 0, 56, 43, 65, 117, 39, 154, 94, 247, 242, 35, 66, 116, 11, 56, 52, 2, 0, 0, 32, 0, 56, 43, 65, 117, 39, 154, 94, 247, 242, 35, 66, 116] ([120, 156, 11, 56, 52, 2, 0, 0, 32, 0, 56, 43, 65, 117, 39, 154, 94, 247, 242, 35, 66, 116, 11, 56, 52, 2, 0, 0, 32, 0, 56, 43, 65, 117, 39, 154, 94, 247, 242, 35, 66, 116, 5, 192, 129, 0, 0, 0, 0, 0, 144, 255, 107, 0, 246, 95, 20, 111], None) -- flate2 [120, 1, 74, 203, 71, 7, 73, 137, 69, 73, 137, 85, 0, 0, 0, 0, 255, 255] [120, 1, 74, 203, 71, 7, 73, 137, 69, 73, 137, 85, 0, 0, 0, 0, 255, 255, 74, 195, 33, 14, 0, 0, 0, 255, 255] [120, 1, 74, 203, 71, 7, 73, 137, 69, 73, 137, 85, 0, 0, 0, 0, 255, 255, 74, 195, 33, 14, 0, 0, 0, 255, 255, 3, 0, 246, 95, 20, 111] ```

For a more technical reference on Z_SYNC_FLUSH etc, please see https://www.bolet.org/~pornin/deflate-flush.html and CTRL+F for 00 00 FF FF for details.

sile commented 5 years ago

@z64 Thank you for your detailed information! I have added zlib::EncodeOptions::flush_mode() method for specifing Z_SYNC_FLUSH flag. (Unit tests will be added later ...)