rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
895 stars 161 forks source link

GzDecoder.read_exact return Ok yet does nothing for vec #308

Closed jtmoon79 closed 1 year ago

jtmoon79 commented 2 years ago

For GzDecoder.read_exact, passing a v = Vec::<u8>::with_capacity(5) as a &mut [u8] (via call to as_mut()) will result in Ok(()) yet v will have no data (v.len() == 0).

reproduction

Create some gzipped test file.

echo 'THIS IS FILE test1 WITH SOME HIGH UNICODE ☺☺☺☺☺☺' | gzip -v > test1.gz

Create the project files:

File Cargo.toml

[package]
name = "test-GzDecoder-Issue"
version = "0.1.0"
edition = "2021"

[dependencies]
flate2 = "1.0.24"

File src/main.rs

use std::io;
use std::io::prelude::*;
use std::fs::File;

extern crate flate2;
use flate2::read::GzDecoder;

fn main() -> io::Result<()> {
    let file_gz = File::open("test1.gz").unwrap();
    let mut decoder = GzDecoder::new(file_gz);
    dbg!(&decoder);
    let mut bytes5: Vec<u8> = Vec::<u8>::with_capacity(5);
    dbg!(&bytes5);

    let mut done = false;
    while !done {
        match decoder.read_exact(bytes5.as_mut()) {
               Ok(_) => {
                   std::io::stdout().write_all(&bytes5)?;
               },
               Err(err) if err.kind() == std::io::ErrorKind::UnexpectedEof => {
                   std::io::stdout().write_all(&bytes5)?;
                   done = true;
               },
               Err(err) => {
                   eprintln!("Err: {:?}", err);
                   done = true;
               }
        }
        bytes5.fill(0);
    }
    std::io::stdout().write_all(b"\n")?;

    Ok(())
}

Run cargo run.

Result

The program is stuck in a loop calling read_exact which has no affect yet returns Ok(()).

Changing the program to use a stack-allocated array results in success (i.e. the file test1.gz is decompressed and written to the terminal).

use std::io;
use std::io::prelude::*;
use std::fs::File;

extern crate flate2;
use flate2::read::GzDecoder;

fn main() -> io::Result<()> {
    let file_gz = File::open("test1.gz").unwrap();
    let mut decoder = GzDecoder::new(file_gz);
    dbg!(&decoder);
    //let mut bytes5: Vec<u8> = Vec::<u8>::with_capacity(5);
    let mut bytes5: [u8; 5] = [0, 0, 0, 0, 0];
    dbg!(&bytes5);

    let mut done = false;
    while !done {
        //match decoder.read_exact(bytes5.as_mut()) {
        match decoder.read_exact(&mut bytes5) {
               Ok(_) => {
                   std::io::stdout().write_all(&bytes5)?;
               },
               Err(err) if err.kind() == std::io::ErrorKind::UnexpectedEof => {
                   std::io::stdout().write_all(&bytes5)?;
                   done = true;
               },
               Err(err) => {
                   eprintln!("Err: {:?}", err);
                   done = true;
               }
        }
        bytes5.fill(0);
    }
    std::io::stdout().write_all(b"\n")?;

    Ok(())
}

Expectation

In the case of no useful result using a Vec::<u8>, I would expect read_exact return an informative error.

I would prefer that GzDecoder.read_exact successfully fill an array passed from a vector, Vec::<u8>::as_mut().


Using rust 1.61.0-beta.3 (1ef1e0a 2022-03-31) and flate2 1.0.24 on Ubuntu.

jtmoon79 commented 2 years ago

After some tinkering, I'm somewhat sure this is because Vec::<u8>::with_capacity(5).as_mut().len() returns value 0, whereas [u8; 5].len() returns 5.

However, this leads to a tangential question:

For the last call to read_exact, when it returns ErrorKind::UnexpectedEof, how would the caller determine what bytes are of interest? Since the len() will be entire length of the passed &mut [u8] yet not all elements were written.

(I'm not sure if this is right place for this question).

MichaelMcDonnell commented 2 years ago

Hi @jtmoon79. It's not a bug in flate2 but how Rust defines the Read trait.

You are correct that the problem is the length of the vector is 0. The read documentation says:

"It is your responsibility to make sure that buf is initialized before calling read.

The code you have allocates the memory for the vector but doesn't initialize it. You can initialize it to all zeroes by doing:

let mut bytes5 = vec![0; 5];

Then len will then return 5.

Also, the infinite loop you got is also expected by what the (shortened) documentation says:

If the return value of this method is Ok(n), then implementations must guarantee that 0 <= n <= buf.len(). A nonzero n value indicates that the buffer buf has been filled in with n bytes of data from this source. If n is 0, then it can indicate one of two scenarios:

  1. ...
  2. The buffer specified was 0 bytes in length.

The code successfully reads 0 bytes every time but doesn't make progress, which is why it gets stuck.

The answer to your second question about read_exact is also in the Rust documentation:

If this function encounters an “end of file” before completely filling the buffer, it returns an error of the kind ErrorKind::UnexpectedEof. The contents of buf are unspecified in this case.

If any other read error is encountered then this function immediately returns. The contents of buf are unspecified in this case.

The buffer is unspecified if there is an error and must not be used.

I recommend not using read_exact or read unless you have a specific use case. They are tricky to get right. You could potentially use read which returns how many bytes it has read, but I think the write_all might fail if the code reads a partial utf-8 character.

I would instead recommend using read_to_string to read the whole file into memory, if possible, or use lines to read the file line by line. They're much easier to use.

jtmoon79 commented 2 years ago

Thanks for the detailed reply @MichaelMcDonnell ! 🙂

This was partially a newbie lesson on allocation vs. initialization, and RTFM, lol 😊

I recommend not using read_exact or read unless you have a specific use case.

Thanks for the suggestion. In my case, I decided on GzDecoder::read with a lot of extra checks. In my case, I want to read a precise amount of data and fill a buffer. Not pretty, but it worked best among the approaches I experimented with: https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/0.0.33/src/readers/blockreader.rs#L1678-L1782

MichaelMcDonnell commented 1 year ago

You're welcome! I/O is really hard to get right.

Could you please close the issue since it is not a bug in flate2?

jtmoon79 commented 1 year ago

Closing. Per discussion, this is a rust design issue.