zip-rs / zip-old

Zip implementation in Rust
MIT License
731 stars 204 forks source link

Binary file seems to be corrupted with method Deflated #378

Closed mass10 closed 1 year ago

mass10 commented 1 year ago

Summary

 Binary file seems to be corrupted with method zip::CompressionMethod::Deflated.

Code to reproduce

use std::io::{Read, Write};

/// Example of deflate.
fn main() -> Result<(), Box<dyn std::error::Error>> {
    // ARCHIVE
    {
        // START
        let w = std::fs::File::create("notepad.exe.zip")?;
        let mut archiver = zip::ZipWriter::new(w);

        let options = zip::write::FileOptions::default();
        // let options = options.compression_method(zip::CompressionMethod::Stored); // SAFE
        let options = options.compression_method(zip::CompressionMethod::Deflated); // CORRUPT🔥
        archiver.start_file("notepad.exe", options)?;

        // READ
        let mut stream = std::fs::File::open("C:\\Windows\\system32\\notepad.exe")?;
        loop {
            let mut buffer = [0; 1000];
            let bytes_read = stream.read(&mut buffer)?;
            if bytes_read == 0 {
                break;
            }

            // WRITE
            let write_buffer = &buffer[..bytes_read];
            archiver.write(&write_buffer)?;
        }

        // COMPLETE
        archiver.finish().unwrap();
    }

    // UNZIP
    {
        let command = std::process::Command::new("wsl.exe").args(&["unzip", "notepad.exe.zip"]).spawn()?.wait()?;
        println!("Command exited with: {}", command);
    }

    // ORIGINAL FILE SIZE
    {
        let meta = std::fs::metadata("C:\\Windows\\system32\\notepad.exe")?;
        println!("Original NOTEPAD is: {} bytes.", meta.len());
    }

    // EXTRACTED FILE SIZE
    {
        let meta = std::fs::metadata("notepad.exe")?;
        println!("Extracted NOTEPAD is: {} bytes.", meta.len());
    }

    return Ok(());
}

Result is

Original NOTEPAD is: 201216 bytes.
Extracted NOTEPAD is: 200903 bytes.

Thank you!

mass10 commented 1 year ago

zip::CompressionMethod::Zstd sounds good.

mass10 commented 1 year ago

But I found neither of the following supports ZStandard:

milesj commented 1 year ago

Just ran into this also. Have to use Stored for now...

axnsan12 commented 1 year ago

You are using archiver.write (an implementation of std::io::Write::write) and ignoring the return value. You should be calling write in a loop checking the count of bytes written, or use archiver.write_all.

https://doc.rust-lang.org/stable/std/io/trait.Write.html#tymethod.write

This function will attempt to write the entire contents of buf, but the entire write might not succeed, or the write may also generate an error. A call to write represents at most one attempt to write to any wrapped object.

mass10 commented 1 year ago

Oh, It was fixed!

Thank you!

milesj commented 1 year ago

I'm using write_all and it still fails, here's my impl: https://github.com/moonrepo/starbase/blob/master/crates/archive/src/zip.rs

mass10 commented 1 year ago

@milesj I thought I want to use fs::read_file_bytes() instead of fs::read_file() in your code.

milesj commented 1 year ago

When I try that I get this error "stream did not contain valid UTF-8", which is the same as using fs::read_file(file)?.as_bytes().

Both are interesting, since these test cases are very simple.

I also tried this, which failed with the same error.

let mut buffer = Vec::new();
let mut stream = fs::open_file(file)?;
stream.read_to_end(&mut buffer).unwrap();

I feel like something else may be going on.

mass10 commented 1 year ago

I tried this.

/// Succeeds🌔
fn diagnose_1(path: &str) {
    let data = std::fs::read(path).unwrap();
    println!("(1) {} is {} bytes.", path, data.len());
}

/// Succeeds🌔
fn diagnose_2(path: &str) {
    use std::io::Read;
    let mut file = std::fs::File::open(path).unwrap();
    let mut buffer = Vec::new();
    file.read_to_end(&mut buffer).unwrap();
    println!("(2) {} is {} bytes.", path, buffer.len());
}

/// Fails🌒
fn diagnose_3(path: &str) {
    let text = std::fs::read_to_string(path).unwrap(); // 🔥stream did not contain valid UTF-8
    let data = text.as_bytes();
    println!("(3) {} is {} bytes.", path, data.len());
}

pub fn main() {
    diagnose_1(r"C:\Windows\system32\notepad.exe"); //🌔
    diagnose_2(r"C:\Windows\system32\notepad.exe"); //🌔
    diagnose_3(r"C:\Windows\system32\notepad.exe"); //🌒
}

Result was

(1) C:\Windows\system32\notepad.exe is 201216 bytes.
(2) C:\Windows\system32\notepad.exe is 201216 bytes.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: InvalidData, message: "stream did not contain valid UTF-8" }', src\test2.rs:18:46
milesj commented 1 year ago

It's weird, because even if I do this it fails:

self.archive
    .start_file(name, options)
    .map_err(|error| ZipError::AddFailure {
        source: file.to_path_buf(),
        error,
    })?;

self.archive
    .write_all(&std::fs::read(file).unwrap())
    .map_err(|error| FsError::Write {
        path: file.to_path_buf(),
        error,
    })?;

I wonder if this abstraction is causing issues once compiled.

Example PR: https://github.com/moonrepo/starbase/pull/33

mass10 commented 1 year ago

@milesj What kind of file is broken or fail?

milesj commented 1 year ago

Not sure which file is the problem, but this is the fixture: https://github.com/moonrepo/starbase/tree/master/crates/archive/tests/__fixtures__/archives

They are very simple files.

milesj commented 1 year ago

I figured out the problem, was my mistake. The issue was I was using by_index_raw instead of by_index when unzipping.

I'm using Copilot and I think it generated that 😞

mass10 commented 1 year ago

That's good! Everything has become clear.