Problem reading archives containg Zip64 files - Githubissues

icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.

http://icsharpcode.github.io/SharpZipLib/

MIT License

3.71k stars 979 forks source link

Problem reading archives containg Zip64 files #698

Open MatthewSteeples opened 2 years ago

MatthewSteeples commented 2 years ago

Steps to reproduce

Please see the test located at https://github.com/MatthewSteeples/SharpZipLib/commit/c76de92b5f250f37d292112cb26639afc53d0d9b

Expected behavior

File should extract normally and read 1 byte from each file (we're experiencing this problem even when reading to the end of the Stream, this is just for illustrative purposes)

Actual behavior

When seeking to the end of file 2 (the large file) the following exception is thrown ICSharpCode.SharpZipLib.Zip.ZipException : Data descriptor signature not found

Version of SharpZipLib

1.3.3 but also verified against master

Obtained from (only keep the relevant lines)

Compiled from source, commit: ff64d0a
Package installed using NuGet (1.3.3)

I'm afraid I can't spot anything obvious about what it might be. 7Zip happily opens the generated file and marks the 2 small files as version 20, with the large file being a version 45 and having a Zip64 descriptor (in Characteristics)

Hope that's enough information, but please let me know if there's anything else I can provide

Please note that this test will spit out 50mb tmp files that you'll need to clean up afterwards

piksel commented 2 years ago

Could you upload the generated zip file to https://archivediag.azurewebsites.net? I could generate it from the test as well, but I don't think I have the time to do it for a while, but with the report I can take a look. Unfortunately, it still says that the blob could not be found instead of "waiting for azure function to pick up the job". It shouldn't take more than 5 min though.

MatthewSteeples commented 2 years ago

Hi @piksel,

It won't upload to there as the file is too large (50mb)

Failed to load resource: the server responded with a status of 413 (Request Entity Too Large)

piksel commented 2 years ago

Aight. I'll take a look.

piksel commented 2 years ago

Well, here is the report: https://pub.p1k.se/sharpziplib/archivediag/issue-698.zip.html

The local header for the large entry has bit 4 (the Descriptor bit) set, which means that the actual size and CRC will follow after the compressed data. But there is no such descriptor following it. Instead, the sizes and CRC are only written to the "Central Header" (which is like a look-up directory for the file in the archive). This means that the zip file is corrrupt (or rather, out of spec) and cannot be read in a streaming matter. If it is accessed in a random-access way instead, it's technically possible to read it (which is why 7z for example can read it, since it only works with random-access files, not streams). Actually, 7zip does show the file as having an error: and running "Test" fails.

piksel commented 2 years ago

I'm not sure exactly what System.IO.Compression.ZipArchive does here, but it seems like a bug in their end. But in any case, if you use ICSharpCode.SharpZipLib.Zip.ZipFile instead of ZipInputStream it will use the central headers instead of the local ones (and it managed to extract the file perfectly fine when testing just now).

piksel commented 2 years ago

I altered your test code to use ZipOutputStream to generate the zip file, and it actually compressed it better (~25MiB vs ~50MiB), but slower (we are fully managed after all). Now, the resulting file was actually also not possible to read using ZipInputStream (the last test), so there might be some bug here in any case... Here is the report for that file, which shows the descriptor sections that are missing from the ZipArchive version of the file: https://pub.p1k.se/sharpziplib/archivediag/issue-698-expected.zip.html

MatthewSteeples commented 2 years ago

@piksel Thanks for taking the time to have a look. I can't get 7zip to show me the same screen that you've got there. The file has a CRC, none of the files (large or small) have local in the characteristics, and running "Test" in 7zip reports that there are no errors. If I can reproduce what you're seeing then I'll happily take it to Microsoft. Are you sure the file had flushed by the time you're loading it?

geracosta commented 1 year ago

@MatthewSteeples Hello, did you fix this problem ? I'm having the same issue and I can't found the problem. I think that the problem is with the file size I'm trying to compress...

piksel commented 1 year ago

@geracosta Could you upload a file that shows the problem to https://archivediag.piksel.se/ ?