zip-rs / zip-old

Zip implementation in Rust
MIT License
731 stars 204 forks source link

Save multi `ZipFile` of a same `ZipArchive`, and use it later, will panic #325

Closed rise0chen closed 1 year ago

rise0chen commented 1 year ago
    let zip_file = fs::File::open(PATH)?;
    let zip = ZipArchive::new(&zip_file)?;
    let mut files = vec![zip.clone(); zip.len()];
    let mut files: Vec<_> = files
        .iter_mut()
        .enumerate()
        .map(|(i, file)| file.by_index(i).unwrap())
        .collect();
    files.iter_mut().for_each(|x| {
        let mut s = String::new();
        x.read_to_string(&mut s).unwrap();
        println!("{}", s);
    });
`Err` value: Custom { kind: Other, error: "Invalid checksum" }

panicked at x.read_to_string(&mut s).unwrap();

NobodyXu commented 1 year ago

It panicks at error "Invalid checksum" which could be that you zip file is corrupted, this is not zip itself panics.

And you don't need to create files = vec![zip.clone(); zip.len()] which is inefficient, you can just do:

    let mut files: Vec<_> = (0..zip.len())
        .map(|i| zip_file.by_index(i).unwrap())
        .collect();
rise0chen commented 1 year ago

fn by_index(&mut self, filenumber: usize) -> ZipResult<ZipFile<'>>

It need &mut self. mutable reference can only borrow one time. So, I need clone ZipArchive.

rise0chen commented 1 year ago
use std::fs;
use std::io::{Read, Result as IoResult, Write};
use zip::{ZipArchive, ZipWriter};

const PATH: &'static str = "./test.zip";

fn main() -> IoResult<()> {
    create_zip()?;

    let zip_file = fs::File::open(PATH)?;
    let mut zip = ZipArchive::new(&zip_file)?;
    {
        let mut file = zip.by_index(0).unwrap();
        let mut s = String::new();
        file.read_to_string(&mut s).unwrap();
        println!("{}", s);
    }
    let mut files = vec![zip.clone(); zip.len()];
    let mut files: Vec<_> = files
        .iter_mut()
        .enumerate()
        .map(|(i, file)| file.by_index(i).unwrap())
        .collect();
    files.iter_mut().for_each(|x| {
        let mut s = String::new();
        x.read_to_string(&mut s).unwrap();
        println!("{}", s);
    });

    Ok(())
}

fn create_zip() -> IoResult<()> {
    let zip_file = fs::File::create(PATH)?;
    let mut zip = ZipWriter::new(zip_file);
    zip.start_file("1", Default::default())?;
    zip.write_all("data1".as_bytes())?;
    zip.start_file("1", Default::default())?;
    zip.write_all("data1-1".as_bytes())?;
    zip.start_file("2", Default::default())?;
    zip.write_all("data2".as_bytes())?;
    zip.start_file("3", Default::default())?;
    zip.write_all("data3".as_bytes())?;
    zip.finish()?;
    Ok(())
}

zip file is created by ZipWriter, and it is not corrupted.

NobodyXu commented 1 year ago

It need &mut self. mutable reference can only borrow one time. So, I need clone ZipArchive.

Oh you are right.

BTW I think the title is still a bit misleading. The zip code itself does not panic while extracting, it merely returns an error. The panic is caused by the unwrap you use.

IMO it will be better to say that use it later gives "Invalid checksum" error.

rise0chen commented 1 year ago

I don't think it should return "Invalid checksum".

use std::fs;
use std::io::{Read, Result as IoResult, Write};
use zip::{ZipArchive, ZipWriter};

const PATH: &'static str = "./test.zip";

fn main() -> IoResult<()> {
    create_zip()?;

    let zip_file = fs::File::open(PATH)?;
    let mut zip = ZipArchive::new(&zip_file)?;
    {
        let mut file1 = zip.by_index(0).unwrap();
        let mut s = String::new();
        file1.read_to_string(&mut s).unwrap();
        println!("{}", s);
    }
    {
        let mut file2 = zip.by_index(1).unwrap();
        let mut s = String::new();
        file2.read_to_string(&mut s).unwrap();
        println!("{}", s);
    }

    {
        let mut zip1 = zip.clone();
        let mut zip2 = zip.clone();
        let mut file1 = zip1.by_index(0).unwrap();
        let mut file2 = zip2.by_index(1).unwrap();

        let mut s = String::new();
        file1.read_to_string(&mut s).unwrap();
        println!("{}", s);

        let mut s = String::new();
        file2.read_to_string(&mut s).unwrap();
        println!("{}", s);
    }

    Ok(())
}

fn create_zip() -> IoResult<()> {
    let zip_file = fs::File::create(PATH)?;
    let mut zip = ZipWriter::new(zip_file);
    zip.start_file("1", Default::default())?;
    zip.write_all("data1".as_bytes())?;
    zip.start_file("1", Default::default())?;
    zip.write_all("data1-1".as_bytes())?;
    zip.start_file("2", Default::default())?;
    zip.write_all("data2".as_bytes())?;
    zip.start_file("3", Default::default())?;
    zip.write_all("data3".as_bytes())?;
    zip.finish()?;
    Ok(())
}

It can work when use ZipFile one by one, and it return "Invalid checksum" when muti ZipFile

NobodyXu commented 1 year ago

I don't think it should return "Invalid checksum".

I'm not saying it should, I'm suggesting a better title would be mention "invalid checksum" instead of panic.

It can work when use ZipFile one by one, and it return "Invalid checksum" when muti ZipFile

This might have to do with this line:

    let mut zip = ZipArchive::new(&zip_file)?;

You are sharing one std::fs::File in multiple ZipArchive. Each ZipArchive do some reads and seeks, which changes the internal cursor of std::fs::File maintained by the OS.

The way to fix this error is to create multiple independent std::fs::File by either opening multiple times, or using File::try_clone, or you can use the std::os::unix::fs::FileExt interface and create a new type like this:

struct FileRef<'a>(&'a std::fs::File, u64);

and manually implements Read and Seek on it, but that will be unix-only.

rise0chen commented 1 year ago

Is the internal cursor changed when call by_index? Can we change the internal cursor before call read?

NobodyXu commented 1 year ago

Is the internal cursor changed when call by_index?

Yes, it certainly is changed. by_index uses seek to move the cursor then read from it.

Can we change the internal cursor before call read?

ZipArchiver assumes that the reader you passed in does not get changed by others.

So to fix this, you either creates multiple std::fs::File or use unix-only std::os::unix::fs::FileExt.

NobodyXu commented 1 year ago

Another option is to use mmarinus to mmap the file and obtain a &[u8] reference, then wrap that in std::io::Cursor and passed it to ZipArchive.

This would also reduce the I/O.

rise0chen commented 1 year ago

creates multiple std::fs::File also need creates multiple ZipArchive. Maybe, we should remove Clone of ZipArchive.

NobodyXu commented 1 year ago

creates multiple std::fs::File also need creates multiple ZipArchive.

You got it in the wrong order:

creates multiple ZipArchive also need multiple std::fs::File.

The Clone implementation of ZipArchive is indeed a bit confusing when combined with the fact that &std::fs::File also implements Read and Seek, removing Clone implementation would indeed makes it less confusing.

@rise0chen While I'm not the maintainer of this crate, you can open a PR and I believe @zamazan4ik can review this since it will be pretty simple and straight-forward.

rise0chen commented 1 year ago
        let mut zip1 = zip.clone();
        let mut zip2 = zip.clone();
        let mut file1 = zip1.by_index(0).unwrap();
        let mut file2 = zip2.by_index(1).unwrap();

        (&zip_file).seek(SeekFrom::Start(file1.data_start()))?;
        let mut s = String::new();
        file1.read_to_string(&mut s).unwrap();
        println!("{}", s);

        (&zip_file).seek(SeekFrom::Start(file2.data_start()))?;
        let mut s = String::new();
        file2.read_to_string(&mut s).unwrap();
        println!("{}", s);

It work, if I change the internal cursor before call read.

rise0chen commented 1 year ago

Maybe, it's a bug of standard library. It shouldn't impl Read for &File

NobodyXu commented 1 year ago

Maybe, it's a bug of standard library. It shouldn't impl Read for &File

That could be considered a bug indeed, maybe you can raise that in zulip?

Plecra commented 1 year ago

This is part of a much wider discussion in Rust of IO safety, but the gist of it is that yes, you need to be aware of file handles being fiddled with from under you. Similar issues will happen if another process edits a file while you're reading from it.

ZipArchive will keep the Clone implementation. Internal mutability is unfortunately a hazard that you need to keep in mind.