Open MatthewSteeples opened 2 years ago
Could you upload the generated zip file to https://archivediag.azurewebsites.net? I could generate it from the test as well, but I don't think I have the time to do it for a while, but with the report I can take a look. Unfortunately, it still says that the blob could not be found instead of "waiting for azure function to pick up the job". It shouldn't take more than 5 min though.
Hi @piksel,
It won't upload to there as the file is too large (50mb)
Failed to load resource: the server responded with a status of 413 (Request Entity Too Large)
Aight. I'll take a look.
Well, here is the report: https://pub.p1k.se/sharpziplib/archivediag/issue-698.zip.html
The local header for the large entry has bit 4 (the Descriptor
bit) set, which means that the actual size and CRC will follow after the compressed data. But there is no such descriptor following it. Instead, the sizes and CRC are only written to the "Central Header" (which is like a look-up directory for the file in the archive). This means that the zip file is corrrupt (or rather, out of spec) and cannot be read in a streaming matter. If it is accessed in a random-access way instead, it's technically possible to read it (which is why 7z for example can read it, since it only works with random-access files, not streams).
Actually, 7zip does show the file as having an error:
and running "Test" fails.
I'm not sure exactly what System.IO.Compression.ZipArchive
does here, but it seems like a bug in their end. But in any case, if you use ICSharpCode.SharpZipLib.Zip.ZipFile
instead of ZipInputStream
it will use the central headers instead of the local ones (and it managed to extract the file perfectly fine when testing just now).
I altered your test code to use ZipOutputStream
to generate the zip file, and it actually compressed it better (~25MiB vs ~50MiB), but slower (we are fully managed after all).
Now, the resulting file was actually also not possible to read using ZipInputStream
(the last test), so there might be some bug here in any case...
Here is the report for that file, which shows the descriptor sections that are missing from the ZipArchive
version of the file: https://pub.p1k.se/sharpziplib/archivediag/issue-698-expected.zip.html
@piksel Thanks for taking the time to have a look. I can't get 7zip to show me the same screen that you've got there. The file has a CRC, none of the files (large or small) have local
in the characteristics, and running "Test" in 7zip reports that there are no errors. If I can reproduce what you're seeing then I'll happily take it to Microsoft. Are you sure the file had flushed by the time you're loading it?
@MatthewSteeples Hello, did you fix this problem ? I'm having the same issue and I can't found the problem. I think that the problem is with the file size I'm trying to compress...
@geracosta Could you upload a file that shows the problem to https://archivediag.piksel.se/ ?
Steps to reproduce
Expected behavior
File should extract normally and read 1 byte from each file (we're experiencing this problem even when reading to the end of the Stream, this is just for illustrative purposes)
Actual behavior
When seeking to the end of file 2 (the large file) the following exception is thrown
ICSharpCode.SharpZipLib.Zip.ZipException : Data descriptor signature not found
Version of SharpZipLib
1.3.3 but also verified against master
Obtained from (only keep the relevant lines)
I'm afraid I can't spot anything obvious about what it might be. 7Zip happily opens the generated file and marks the 2 small files as version 20, with the large file being a version 45 and having a Zip64 descriptor (in Characteristics)
Hope that's enough information, but please let me know if there's anything else I can provide
Please note that this test will spit out 50mb tmp files that you'll need to clean up afterwards