sile / libflate

A Rust implementation of DEFLATE algorithm and related formats (ZLIB, GZIP)
https://docs.rs/libflate
MIT License
178 stars 35 forks source link

GzDecoder seem decode incorrect #66

Closed axetroy closed 2 years ago

axetroy commented 2 years ago
let tar_file = File::open(&tar_file_path)?;
        let input = GzDecoder::new(&tar_file)?;
        let mut archive = Archive::new(input);

        archive.set_unpack_xattrs(true);
        archive.set_overwrite(true);
        archive.set_preserve_permissions(true);
        archive.set_preserve_mtime(true);

        let files = archive.entries()?;

        for entry in files {
            let mut file = entry?;

            let file_path = file.path()?;

            if let Some(file_name) = file_path.file_name() {
                if file_name.to_str().unwrap() == extract_file_name {
                    binary_found = true;
                    file.unpack(&output_file_path)?;
                    break;
                }
            }
        }

test file: https://github.com/axetroy/prune.rs/releases/download/v0.1.1/prune_darwin_amd64.tar.gz

The origin file size is : 985,384 The unzip file size is : 965,416

I have tested Tar, he works fine

sile commented 2 years ago

Thank you for reporting this issue. However, I could not reproduce your problem.

I wrote the following code that deflates the above input file then prints the original size:

fn main() -> anyhow::Result<()> {
    let tar_file = std::fs::File::open("prune_darwin_amd64.tar.gz")?;
    let mut input = libflate::gzip::Decoder::new(&tar_file)?;
    let mut output = Vec::new();
    std::io::copy(&mut input, &mut output)?;

    println!("Deflated size: {}", output.len());
    Ok(())
}

// The comment below is the output of this command.
// Deflated size: 968192

Then, I applied gzip command to deflate the file then confirmed the result file size by using ls:

$ gzip -d prune_darwin_amd64.tar.gz
$ ls -l
-rw-r--r-- 1 user user 968192 Feb 28 02:19 prune_darwin_amd64.tar

So the two results seem matched.

sile commented 2 years ago

BTW, I could not run the code snippet you shared as I don't know where the Archive struct (or enum?) comes from.

axetroy commented 2 years ago

Here is the source code https://github.com/axetroy/cask.rs/blob/main/src/extractor.rs

I tried to give a minimum implementation for reproduce

axetroy commented 2 years ago

@sile Hello, Thanks for your help.

And here is the reproduced repo: https://github.com/axetroy/libflate-66

git clone https://github.com/axetroy/libflate-66
cd ./libflate-66
cargo run ./

# View unzipped files โ€™pruneโ€˜
sile commented 2 years ago

Thank you for the additional information. I could reproduce your result.

Then, I modified the code to use the file ("prune_darwin_amd64.tar") directly deflated by gzip command as follows:

fn main() -> anyhow::Result<()> {
    let extract_file_name = "prune";
    let output_file_path = "output";
    let mut archive = Archive::new(std::fs::File::open("prune_darwin_amd64.tar")?);

    archive.set_unpack_xattrs(true);
    archive.set_overwrite(true);
    archive.set_preserve_permissions(true);
    archive.set_preserve_mtime(true);

    let files = archive.entries()?;

    for entry in files {
        let mut file = entry?;

        let file_path = file.path()?;

        if let Some(file_name) = file_path.file_name() {
            dbg!(&file_name);
            if file_name.to_str().unwrap() == extract_file_name {
                file.unpack(&output_file_path)?;
                print!("unpacked");
                break;
            }
        }
    }
    Ok(())
}

The result was unchanged (i.e., the unpacked file size was 965,416). Thus this is not a problem relevant to libflate I think.

axetroy commented 2 years ago

This makes me confused

I use GZ to compress the file, then decompress it, the result is correct. I use the TAR archive, then decompress it, the result is correct.

But I combined them, the result is incorrect.

But installing the unzipping tools installed on my computer, everything is fine

sile commented 2 years ago

FYI, Python3's tar library could handle the (already deflated) input tar file correctly.

>>> import tarfile
>>> tar = tarfile.open("prune_darwin_amd64.tar")
>>> tar.getmember("prune").size
985384
axetroy commented 2 years ago

OK, it should be the difference between TAR implementation.

Thanks for your help and time. Have a good day ๐Ÿ‘

sile commented 2 years ago

๐Ÿ‘