rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
862 stars 159 forks source link

Issues with newly created file in read-write mode #383

Closed deven96 closed 8 months ago

deven96 commented 8 months ago

Currently I have a snippet of code that's supposed to do a few things

let mut remote_source: Box<dyn std::io::Read>;
let zipped_filename = "test.csv.gz";
let mut zip_buffer = GzDecoder::new(
      std::fs::OpenOptions::new()
          .create(true)
          .write(true)
          .read(true)
          .open(zipped_filename)?,
  );
std::io::copy(&mut remote_source, &mut zip_buffer).unwrap();
let mut unzipped_buffer = Vec::new();
// This is the line below that fails with `std::io::ErrorKind::UnexpectedEof`
// almost as if the writer was never flushed into the newly created file before the reader was triggered
std::io::copy(&mut zip_buffer, &mut unzipped_buffer).unwrap();

In order to temporarily fix I rewrote directly to the new file and only after copying from remote source did I open the file in read mode with GzDecoder. I'm not quite certain if the issue would be from OpenOptions itself or GzDecoder

the8472 commented 8 months ago

You need to seek the file back to the start after writing.

deven96 commented 8 months ago

You're correct @the8472

Seek isn't implemented for GzDecoder and I tried Write::flush maybe it would reset the cursor but to no avail

Byron commented 8 months ago

Can you try to open the decoded file after it was flushed, from disk? I have a feeling this is an issue related to the way GzDecoder works, and I'd be a bit surprised if one can use it like it's done here.

My hypothesis is that the decoded data is written to disk, but that it's impossible to do so through GzDecoder.

deven96 commented 8 months ago

The .csv.gz is actually on disk correctly i.e I can manually unzip and retrieve the original file after the first copy (even without flush). However as I can't reset the cursor via GzDecoder there's no way to tell it to return to start before starting the read

the8472 commented 8 months ago

You can access the inner via get_mut. Or you can operate on the File directly and only wrap it in the encoder once it's written. It shouldn't be necessary to write through the encoder.

deven96 commented 8 months ago

That's the implementation I eventually stuck with, writing to the file first and then operating after. Would it be possible however to do

use std::io::Seek
/// where there is an `impl <S: Seek> for GzDecoder<S>`
zip_buffer.rewind()?;

With usecase being that I want to [ write (compress to gz) -> read (uncompressed) ]

deven96 commented 8 months ago

Although I can see that the Write implementation expects already compressed data