rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
900 stars 162 forks source link

GzDecoder returns UnexpectedEof when using read::GzEncoder #273

Closed Gabirel closed 3 years ago

Gabirel commented 3 years ago

Difference between read::GzEncoder and write::GzEncoder

I get an UnexpectedEof error when using read::GzEncoder but it works totally fine with write::GzEncoder. You may see code below and the corresponding output.

Use flate2::read::GzEncoder for encoding

code below: https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=2a8d710b07f530fd79bd7ae77313ff59

use std::io;
use std::io::Read;

use flate2::bufread::GzDecoder;
use flate2::read::GzEncoder;
use flate2::Compression;

fn main() {
    let s = gz_encoder();
    println!("{:?}", s);
    println!("{:?}", gz_decoder(&s));
}

fn gz_encoder() -> Vec<u8> {
    let mut ret_vec = [0; 100];
    let c = b"hello world";
    let mut z = GzEncoder::new(&c[..], Compression::fast());
    let count = z.read(&mut ret_vec).unwrap();
    let v = &ret_vec[0..count];
    v.to_vec()
}

fn gz_decoder(s: &Vec<u8>) -> io::Result<String> {
    let mut gz = GzDecoder::new(&s[..]);
    let mut ans = String::new();
    gz.read_to_string(&mut ans)?;
    Ok(ans)
}

However, I get this error:

[31, 139, 8, 0, 0, 0, 0, 0, 4, 255, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0]
Err(Kind(UnexpectedEof))

Use flate2::write::GzEncoder for encoding

code below: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5d3d278dce92a8fd49ffeae30e36b454

use std::io;
use std::io::{Read, Write};

use flate2::bufread::GzDecoder;
use flate2::write::GzEncoder;
use flate2::Compression;

fn main() {
    let s = gz_encoder();
    println!("{:?}", s);
    println!("{:?}", gz_decoder(&s));
}

fn gz_encoder() -> Vec<u8> {
    let mut e = GzEncoder::new(Vec::new(), Compression::default());
    let c = "hello world".as_bytes();
    e.write_all(c).unwrap();
    let bytes = e.finish().unwrap();
    println!("{:?}", bytes);
    bytes
}

// same decoder as the previous one
fn gz_decoder(s: &Vec<u8>) -> io::Result<String> {
    let mut gz = GzDecoder::new(&s[..]);
    let mut ans = String::new();
    gz.read_to_string(&mut ans)?;
    Ok(ans)
}

In this case, it works fine.

[31, 139, 8, 0, 0, 0, 0, 0, 0, 255, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0, 133, 17, 74, 13, 11, 0, 0, 0]
[31, 139, 8, 0, 0, 0, 0, 0, 0, 255, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0, 133, 17, 74, 13, 11, 0, 0, 0]
Ok("hello world")

Question

I don't really get it. Why I can't just use read::GzEncoder to encode? Am I missing something? I can't see any difference between them simply by the examples.

If I do, please point it out for me. Thanks in advance.

alexcrichton commented 3 years ago

Thanks for the report! I think, though, that you need to use read_to_end instead of just a bare read since it's probably doing a short read.

Gabirel commented 3 years ago

Do you mean by this:

fn gz_encoder() -> Vec<u8> {
    let mut ret_vec = [0; 100];
    let c = b"hello world";
    let mut z = GzEncoder::new(&c[..], Compression::fast());
    // let count = z.read(&mut ret_vec).unwrap();
    let count = z.read_to_end(&mut ret_vec).unwrap();  // `read` -> `read_to_end`?
    let v = &ret_vec[0..count];
    v.to_vec()
}

However, read_to_end(&mut buf) reads all bytes until EOF in this source, placing them into buf. Maybe I misunderstand your meaning?

If you mean by using read_to_end in decoder, I can still get Err(Kind(UnexpectedEof)).

use std::io;
use std::io::Read;

use flate2::bufread::GzDecoder;
use flate2::read::GzEncoder;
use flate2::Compression;

fn main() {
    let s = gz_encoder();
    println!("{:?}", s);
    println!("{:?}", gz_decoder(&s));
}

fn gz_encoder() -> Vec<u8> {
    let mut ret_vec = [0; 100];
    let c = b"hello world";
    let mut z = GzEncoder::new(&c[..], Compression::fast());
    let count = z.read(&mut ret_vec).unwrap();
    // let count = z.read_to_end(&mut ret_vec).unwrap();
    let v = &ret_vec[0..count];
    v.to_vec()
}

fn gz_decoder(s: &Vec<u8>) -> io::Result<String> {
    let mut gz = GzDecoder::new(&s[..]);
    let mut ans = String::new();
    let mut v = Vec::new();
    gz.read_to_end(&mut v)?; // use `read_to_end()`
    ans = String::from_utf8(v).expect("Found invalid UTF-8");
    Ok(ans)
}
alexcrichton commented 3 years ago

rerad_to_end appends to the vector, so you shouldn't start with a [0;100], you'd start with Vec::new()

Gabirel commented 3 years ago

Ohh, your advice works! Thank you!

But I get a few questions here.

The working version is:

fn gz_encoder() -> Vec<u8> {
    let mut ret_vec = Vec::new();
    let c = b"hello world";
    let mut z = GzEncoder::new(&c[..], Compression::fast());
    let count = z.read_to_end(&mut ret_vec).unwrap();
    let v = &ret_vec[0..count];
    v.to_vec()
}

Its output is:

# length = 31
[31, 139, 8, 0, 0, 0, 0, 0, 4, 255, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0, 133, 17, 74, 13, 11, 0, 0, 0]
Ok("hello world")

This version of code comes from example: https://github.com/rust-lang/flate2-rs/blob/c37824894daacc0ad7bbca566c48a897cf973c4f/examples/gzencoder-read.rs#L15-L19

fn gz_encoder() -> Vec<u8> {
    let mut ret_vec = [0; 100];
    let c = b"hello world";
    let mut z = GzEncoder::new(&c[..], Compression::fast());
    let count = z.read(&mut ret_vec).unwrap();
    let v = &ret_vec[0..count];
    v.to_vec()
}

Its output is:

# length = 23
[31, 139, 8, 0, 0, 0, 0, 0, 4, 255, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0]
Err(Kind(UnexpectedEof))
  1. Question 1: When should I use the first one and when should I use the second one provided by the example? So, what's the difference? I really don't understand it. Could you please elaborate it with more details? If you do, I am really grateful for your explanation.

  2. Question 2: Does this case mean that the example is wrong or not fit for the general usage?

alexcrichton commented 3 years ago

Ah those examples were incorrect and should have been using something other than read. I've updated them to use read_to_end now.

Gabirel commented 3 years ago

Thanks for your reply and patience again. The problem is solved.