madler / zlib

A massively spiffy yet delicately unobtrusive compression library.
http://zlib.net/
Other
5.46k stars 2.41k forks source link

Is it correct that minizip allows duplicate files in a zip? #943

Closed gambr closed 4 months ago

gambr commented 4 months ago

Dear All, I'm using minizip lib to archive some files. These files are collected parsing some other files and the list of files to be archived may have duplicates. I know that I can prevent duplicates filtering this file list before adding them to the zip but I'm wondering if it is correct that zipOpenNewFileInZip returns ZIP_OK even if the fille is already present. Is there any case I'm missing that requires duplicated files in a zip? Or should zipOpenNewFileInZip return some error code instead?

Regards

nmoinvaz commented 4 months ago

I'm not aware of any useful case. I think the library just doesn't consider that case.

gambr commented 4 months ago

It would then make sense to enhance the library to avoid duplicate files and throw an exception on the zipOpenNewFileInZip if the file is already contained in the archive.

madler commented 4 months ago

I know of no such use. Interestingly, the zip specification does not explicitly forbid it.

In my opinion, it should be the responsibility of the application to avoid duplicates, if that is a possibility for some reason. Normally it would not, so there shouldn't be a bunch of extra work for everyone just because of outlier cases. And just because we don't know of a use case, doesn't mean that returning an error for a duplicate won't screw someone up out there. It would be an incompatible change to the interface.

There could however be a separate function for the application to check if a name is in the zip file's original central directory (when appending) or in the central directory being built up, which has the names of the entries added so far. While the application could keep track of the names it's adding, it doesn't know what names are already in a zip file being appended to. So I think there should be such a function.

gambr commented 4 months ago

And just because we don't know of a use case, doesn't mean that returning an error for a duplicate won't screw someone up out there. It would be an incompatible change to the interface.

I had actually thought about it, I agree.

There could however be a separate function for the application to check if a name is in the zip file's original central directory (when appending) or in the central directory being built up, which has the names of the entries added so far. While the application could keep track of the names it's adding, it doesn't know what names are already in a zip file being appended to. So I think there should be such a function.

In fact I had given a look at the unzip code, founding a unzLocateFile function that seems doing that check. Unfortunately it does not work directly on a file opened by the ziper. Such a function would be useful.

madler commented 4 months ago

I added a function. https://github.com/madler/zlib/commit/4a5e3e7d255f3f8eba9ecdb8bd8080db43bf0aeb

nmoinvaz commented 4 months ago

Unfortunately it does not work directly on a file opened by the ziper.

Can you explain this? I've used unzLocateFile before and it works fine.