Closed perlun closed 11 years ago
Hi, thanks for reporting. Checking CRC32 is a good idea. Too bad, there is no built-in CRC32 algorithm in .NET framework, so it should be embedded in Unzip.cs.
A file which should have a CRC32 value of 58810710 gets unpacked to b30ec35d. The file size is also different to what it should be (475136 instead of 481534 bytes).
The library itself doesn't handle decompression, it uses standard .NET DeflateStream. It should handle the data compressed using standard-compliant deflate algorithm. Make sure the file you're unpacking is not damaged.
I'll add checks for CRC and file size to the next version.
Thanks for a really quick reply! :smile: The archive is definitely OK; I unpacked it using unzip
which did the job correctly. I suspect the problem is specifically that the last few KiB(s) of the file gets chopped of. Here is a diff with some error checking I added, which highlights the problem with my sample archive:
From 306ede3b6da7ead6832aadb8b719c4824c547d9b Mon Sep 17 00:00:00 2001
From: Per Lundberg <per.lundberg@ecraft.com>
Date: Fri, 8 Nov 2013 16:08:50 +0100
Subject: [PATCH] Added check to ensure that the unpacked file has the right
size.
---
src/eCraft.appFactory.a/Internals/Unzip.cs | 37 +++++++++++++++++++-----------
1 file changed, 23 insertions(+), 14 deletions(-)
diff --git a/src/eCraft.appFactory.a/Internals/Unzip.cs b/src/eCraft.appFactory.a/Internals/Unzip.cs
index 237eddd..b1e1242 100644
--- a/src/eCraft.appFactory.a/Internals/Unzip.cs
+++ b/src/eCraft.appFactory.a/Internals/Unzip.cs
@@ -165,9 +165,18 @@ namespace eCraft.appFactory.Internals
using (var outStream = File.Create(outputFileName))
{
- Extract(entry, outStream);
+ Extract(entry, outStream);
}
+ var fileInfo = new FileInfo(outputFileName);
+ if (fileInfo.Length != entry.OriginalSize)
+ {
+ throw new InvalidDataException(String.Format(
+ "Corrupted archive: {0} has an uncompressed size {1} which does not match its expected size {2}",
+ outputFileName, fileInfo.Length, entry.OriginalSize
+ ));
+ }
+
File.SetLastWriteTime(outputFileName, entry.Timestamp);
}
@@ -184,23 +193,23 @@ namespace eCraft.appFactory.Internals
return entry;
}
--
1.7.11.1
Any ideas what I could do to add a workaround? I am 100% sure that the file in question is valid. It seems like Unzip.cs somehow misses some of the data when it should decompress it, which is very odd. I might be able to produce a sample file for you if you like (it's created with rubyzip
) which you could use for testing it a bit further.
Hmm, that's very strange! I remember having some edge-case issues when DeflateStream seemed to want to eat up more bits before decompressing a chunk of data, but everything should be OK when dealing with the complete stream... Will look into it.
I might be able to produce a sample file for you if you like (it's created with rubyzip) which you could use for testing it a bit further.
That would be absolutely great! Thanks.
http://pastelink.me/dl/374084 - there you have a file which gets incorrectly decompressed with Unzip.cs
. However, the file size of the unpacked file seems to be correct; it's just the content which is corrupt. The command-line unzip
unpacks it correctly.
Hi Per, thanks for providing me the file.
I tried opening it with 7-zip archive manager http://www.7-zip.org/ and with a built-in zip archive of Far manager http://farmanager.com/ and they both report a CRC error:
The message is "CRC error in file 'foo.jar'. The file is damaged."
Looks like the file is indeed not standard-compliant. What unzip utility do you use to open it?
That's quite interesting, and it's probably also the reason why Unzip.cs
complains about it. This is what my command-line unzip says:
plundberg@ecvaawplun1:~/Downloads$ unzip -v foo\ \(1\).zip
Archive: foo (1).zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
3441 Defl:N 3366 2% 11-08-2013 16:22 f23ea799 foo.jar
-------- ------- --- -------
3441 3366 2% 1 file
It also unzips the file correctly, both on Windows and Mac (my host OS). The unzip is this one: http://www.info-zip.org/UnZip.html, please try it for your self.
I suspect the problem is exactly this. It's just so weird that the Ruby zip-library creates the files like this...
It turned out to be weirdness in the Ruby library that I was using. I don't think this has to be fixed in Unzip.cs
; then again, adding some sanity checks (like checking the file size and CRC32) would certainly not hurt, to make it more easy to spot strange issues like this.
I've tried to feed the file to the official PKWare's command line PKZIP25.EXE utility. It also says the same about CRC error:
D:\2>PKZIP25.EXE -extract foo.zip PKZIP(R) Version 2.50 FAST! Compression Utility for Windows 95/NT 4-15-1998 Copyright 1989-1998 PKWARE Inc. All Rights Reserved. Shareware Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745
Extracting files from .ZIP: foo.zip Inflating: foo.jar PKZIP: (W4) Warning! file fails CRC check
PKZIP: (E9) No file(s) found
Looks like Ruby's library uses a non-standard extension of the Deflate algorithm.
Ah, I see. I'll add file length and CRC checks to Unzip.cs. Thanks for your suggestions Per!
It's not weird ruby lib, it's old rubyzip version what was release 3 years ago.
Updated the Nuget package.
@simonoff Fair enough. I just get confused by all the different rubyzip libraries out there...
Maybe offtopic. I have had strange problems with PkZip 2.50, Among thousands of files I identified 3 files that gives errors when testing the zip file, But different files depending on if i use standard or maximum compression, It's only if I use pkzip 2.50 for both zipping and testing/unzipping, If i use Windows builtin ziptool for either zipping or unzipping I don't get any corruption.
D:\Recovery>pkzip25 -test * PKZIP(R) Version 2.50 FAST! Compression Utility for Windows 95/NT 4-15-1998 Copyright 1989-1998 PKWARE Inc. All Rights Reserved. Shareware Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745
Testing files from .ZIP: PkzipMaximumCompression.zip Testing: 2855-003_002_B.pdf PKZIP: (W10) Warning! Deflated file has bad table Testing: 2855-024_001_B.pdf OK Testing: P1012286-44c.jpg PKZIP: (W10) Warning! Deflated file has bad table
Testing files from .ZIP: PkzipStandardCompression.zip Testing: 2855-003_002_B.pdf OK Testing: 2855-024_001_B.pdf PKZIP: (W10) Warning! Deflated file has bad table Testing: P1012286-44c.jpg OK
Testing files from .ZIP: WindowsCompression.zip Testing: 2855-003_002_B.pdf OK Testing: 2855-024_001_B.pdf OK Testing: P1012286-44c.jpg OK
D:\Recovery>pkzip25 -view * PKZIP(R) Version 2.50 FAST! Compression Utility for Windows 95/NT 4-15-1998 Copyright 1989-1998 PKWARE Inc. All Rights Reserved. Shareware Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745
Viewing .ZIP: PkzipMaximumCompression.zip Length Method Size Ratio Date Time CRC-32 Attr Name 13324497 DeflatX 13301866 0.2% 09-10-2020 2:34p db21d5fd --w---- 2855-003_002_B.pdf 20429577 DeflatX 20383415 0.3% 09-10-2020 2:34p 4c8a0b65 --w---- 2855-024_001_B.pdf 646745 DeflatX 636038 1.7% 01-01-2003 1:00a 08da2296 --w---- P1012286-44c.jpg 34400819 34321319 0.3% 3
Viewing .ZIP: PkzipStandardCompression.zip Length Method Size Ratio Date Time CRC-32 Attr Name 13324497 DeflatN 13302175 0.2% 09-10-2020 2:34p db21d5fd --w---- 2855-003_002_B.pdf 20429577 DeflatN 20383912 0.3% 09-10-2020 2:34p 4c8a0b65 --w---- 2855-024_001_B.pdf 646745 DeflatN 636048 1.7% 01-01-2003 1:00a 08da2296 --w---- P1012286-44c.jpg 34400819 34322135 0.3% 3
Viewing .ZIP: WindowsCompression.zip Length Method Size Ratio Date Time CRC-32 Attr Name 13324497 DeflatN 13307488 0.2% 09-10-2020 2:34p db21d5fd --w---- 2855-003_002_B.pdf 20429577 DeflatN 20414492 0.1% 09-10-2020 2:34p 4c8a0b65 --w---- 2855-024_001_B.pdf 646745 DeflatN 636102 1.7% 01-01-2003 1:00a 08da2296 --w---- P1012286-44c.jpg 34400819 34358082 0.2% 3
Hi,
I'm seeing strange CRC errors when unpacking using your library. A file which should have a CRC32 value of 58810710 gets unpacked to b30ec35d. The file size is also different to what it should be (475136 instead of 481534 bytes).
I also note that Unzip.cs does not seem to validate the CRC32 checksum after unzipping, which is a bit bad... I think that should be fixed. It should also check to verify that the file size is really correct.