yallie / unzip

Tiny unzip helper class for .NET 3.5 Client Profile and Mono 2.10, written in pure C#.
http://nuget.org/packages/unzip
MIT License
28 stars 3 forks source link

Data corruption when unpacking certain archives (created using rubyzip) #2

Closed perlun closed 11 years ago

perlun commented 11 years ago

Hi,

I'm seeing strange CRC errors when unpacking using your library. A file which should have a CRC32 value of 58810710 gets unpacked to b30ec35d. The file size is also different to what it should be (475136 instead of 481534 bytes).

I also note that Unzip.cs does not seem to validate the CRC32 checksum after unzipping, which is a bit bad... I think that should be fixed. It should also check to verify that the file size is really correct.

yallie commented 11 years ago

Hi, thanks for reporting. Checking CRC32 is a good idea. Too bad, there is no built-in CRC32 algorithm in .NET framework, so it should be embedded in Unzip.cs.

A file which should have a CRC32 value of 58810710 gets unpacked to b30ec35d. The file size is also different to what it should be (475136 instead of 481534 bytes).

The library itself doesn't handle decompression, it uses standard .NET DeflateStream. It should handle the data compressed using standard-compliant deflate algorithm. Make sure the file you're unpacking is not damaged.

I'll add checks for CRC and file size to the next version.

perlun commented 11 years ago

Thanks for a really quick reply! :smile: The archive is definitely OK; I unpacked it using unzip which did the job correctly. I suspect the problem is specifically that the last few KiB(s) of the file gets chopped of. Here is a diff with some error checking I added, which highlights the problem with my sample archive:

From 306ede3b6da7ead6832aadb8b719c4824c547d9b Mon Sep 17 00:00:00 2001
From: Per Lundberg <per.lundberg@ecraft.com>
Date: Fri, 8 Nov 2013 16:08:50 +0100
Subject: [PATCH] Added check to ensure that the unpacked file has the right
 size.

---
 src/eCraft.appFactory.a/Internals/Unzip.cs | 37 +++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/src/eCraft.appFactory.a/Internals/Unzip.cs b/src/eCraft.appFactory.a/Internals/Unzip.cs
index 237eddd..b1e1242 100644
--- a/src/eCraft.appFactory.a/Internals/Unzip.cs
+++ b/src/eCraft.appFactory.a/Internals/Unzip.cs
@@ -165,9 +165,18 @@ namespace eCraft.appFactory.Internals

            using (var outStream = File.Create(outputFileName))
            {
-               Extract(entry, outStream);
+                Extract(entry, outStream);
            }

+            var fileInfo = new FileInfo(outputFileName);
+            if (fileInfo.Length != entry.OriginalSize)
+            {
+                throw new InvalidDataException(String.Format(
+                    "Corrupted archive: {0} has an uncompressed size {1} which does not match its expected size {2}",
+                    outputFileName, fileInfo.Length, entry.OriginalSize
+                ));
+            }
+
            File.SetLastWriteTime(outputFileName, entry.Timestamp);
        }

@@ -184,23 +193,23 @@ namespace eCraft.appFactory.Internals
            return entry;
        }
-- 
1.7.11.1
perlun commented 11 years ago

Any ideas what I could do to add a workaround? I am 100% sure that the file in question is valid. It seems like Unzip.cs somehow misses some of the data when it should decompress it, which is very odd. I might be able to produce a sample file for you if you like (it's created with rubyzip) which you could use for testing it a bit further.

yallie commented 11 years ago

Hmm, that's very strange! I remember having some edge-case issues when DeflateStream seemed to want to eat up more bits before decompressing a chunk of data, but everything should be OK when dealing with the complete stream... Will look into it.

I might be able to produce a sample file for you if you like (it's created with rubyzip) which you could use for testing it a bit further.

That would be absolutely great! Thanks.

perlun commented 11 years ago

http://pastelink.me/dl/374084 - there you have a file which gets incorrectly decompressed with Unzip.cs. However, the file size of the unpacked file seems to be correct; it's just the content which is corrupt. The command-line unzip unpacks it correctly.

yallie commented 11 years ago

Hi Per, thanks for providing me the file.

I tried opening it with 7-zip archive manager http://www.7-zip.org/ and with a built-in zip archive of Far manager http://farmanager.com/ and they both report a CRC error:

FarManagerArclite

SevenZipCrcError The message is "CRC error in file 'foo.jar'. The file is damaged."

Looks like the file is indeed not standard-compliant. What unzip utility do you use to open it?

perlun commented 11 years ago

That's quite interesting, and it's probably also the reason why Unzip.cs complains about it. This is what my command-line unzip says:

plundberg@ecvaawplun1:~/Downloads$ unzip -v foo\ \(1\).zip
Archive:  foo (1).zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
    3441  Defl:N     3366   2% 11-08-2013 16:22 f23ea799  foo.jar
--------          -------  ---                            -------
    3441             3366   2%                            1 file

It also unzips the file correctly, both on Windows and Mac (my host OS). The unzip is this one: http://www.info-zip.org/UnZip.html, please try it for your self.

I suspect the problem is exactly this. It's just so weird that the Ruby zip-library creates the files like this...

perlun commented 11 years ago

It turned out to be weirdness in the Ruby library that I was using. I don't think this has to be fixed in Unzip.cs; then again, adding some sanity checks (like checking the file size and CRC32) would certainly not hurt, to make it more easy to spot strange issues like this.

yallie commented 11 years ago

I've tried to feed the file to the official PKWare's command line PKZIP25.EXE utility. It also says the same about CRC error:

D:\2>PKZIP25.EXE -extract foo.zip PKZIP(R) Version 2.50 FAST! Compression Utility for Windows 95/NT 4-15-1998 Copyright 1989-1998 PKWARE Inc. All Rights Reserved. Shareware Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745

Extracting files from .ZIP: foo.zip Inflating: foo.jar PKZIP: (W4) Warning! file fails CRC check

PKZIP: (E9) No file(s) found

Looks like Ruby's library uses a non-standard extension of the Deflate algorithm.

yallie commented 11 years ago

Ah, I see. I'll add file length and CRC checks to Unzip.cs. Thanks for your suggestions Per!

simonoff commented 11 years ago

It's not weird ruby lib, it's old rubyzip version what was release 3 years ago.

yallie commented 11 years ago

Updated the Nuget package.

perlun commented 11 years ago

@simonoff Fair enough. I just get confused by all the different rubyzip libraries out there...

it-kotten commented 3 years ago

Maybe offtopic. I have had strange problems with PkZip 2.50, Among thousands of files I identified 3 files that gives errors when testing the zip file, But different files depending on if i use standard or maximum compression, It's only if I use pkzip 2.50 for both zipping and testing/unzipping, If i use Windows builtin ziptool for either zipping or unzipping I don't get any corruption.

D:\Recovery>pkzip25 -test * PKZIP(R) Version 2.50 FAST! Compression Utility for Windows 95/NT 4-15-1998 Copyright 1989-1998 PKWARE Inc. All Rights Reserved. Shareware Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745

Testing files from .ZIP: PkzipMaximumCompression.zip Testing: 2855-003_002_B.pdf PKZIP: (W10) Warning! Deflated file has bad table Testing: 2855-024_001_B.pdf OK Testing: P1012286-44c.jpg PKZIP: (W10) Warning! Deflated file has bad table

Testing files from .ZIP: PkzipStandardCompression.zip Testing: 2855-003_002_B.pdf OK Testing: 2855-024_001_B.pdf PKZIP: (W10) Warning! Deflated file has bad table Testing: P1012286-44c.jpg OK

Testing files from .ZIP: WindowsCompression.zip Testing: 2855-003_002_B.pdf OK Testing: 2855-024_001_B.pdf OK Testing: P1012286-44c.jpg OK

D:\Recovery>pkzip25 -view * PKZIP(R) Version 2.50 FAST! Compression Utility for Windows 95/NT 4-15-1998 Copyright 1989-1998 PKWARE Inc. All Rights Reserved. Shareware Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745

Viewing .ZIP: PkzipMaximumCompression.zip Length Method Size Ratio Date Time CRC-32 Attr Name 13324497 DeflatX 13301866 0.2% 09-10-2020 2:34p db21d5fd --w---- 2855-003_002_B.pdf 20429577 DeflatX 20383415 0.3% 09-10-2020 2:34p 4c8a0b65 --w---- 2855-024_001_B.pdf 646745 DeflatX 636038 1.7% 01-01-2003 1:00a 08da2296 --w---- P1012286-44c.jpg 34400819 34321319 0.3% 3

Viewing .ZIP: PkzipStandardCompression.zip Length Method Size Ratio Date Time CRC-32 Attr Name 13324497 DeflatN 13302175 0.2% 09-10-2020 2:34p db21d5fd --w---- 2855-003_002_B.pdf 20429577 DeflatN 20383912 0.3% 09-10-2020 2:34p 4c8a0b65 --w---- 2855-024_001_B.pdf 646745 DeflatN 636048 1.7% 01-01-2003 1:00a 08da2296 --w---- P1012286-44c.jpg 34400819 34322135 0.3% 3

Viewing .ZIP: WindowsCompression.zip Length Method Size Ratio Date Time CRC-32 Attr Name 13324497 DeflatN 13307488 0.2% 09-10-2020 2:34p db21d5fd --w---- 2855-003_002_B.pdf 20429577 DeflatN 20414492 0.1% 09-10-2020 2:34p 4c8a0b65 --w---- 2855-024_001_B.pdf 646745 DeflatN 636102 1.7% 01-01-2003 1:00a 08da2296 --w---- P1012286-44c.jpg 34400819 34358082 0.2% 3