madler / zlib

A massively spiffy yet delicately unobtrusive compression library.
http://zlib.net/
Other
5.46k stars 2.41k forks source link

Unable to open a 10gb zip file #974

Closed totszwai closed 2 months ago

totszwai commented 2 months ago

I have a zip file that is 10gb in size, but for some reason, it is not able to do it. With the API provided by the minizip, it seems to be stuck in unz64local_SearchCentralDir and eventually return with CENTRALDIRINVALID

https://github.com/madler/zlib/blob/v1.3.1/contrib/minizip/unzip.c#L327

The zip file was created this way:

I get something like this in the zip, both sitting under BLAH folder. image

The code to try to read the zip file is as follows:

    void * uf = unzOpen64(filename);
    if (!uf)
    {
        DBG("Couldn't open ZIP file!  Aborting.\n");
        return;
    }

There is nothing else special, literally just unzOpen64. The zip file was sitting in my Downloads folder, nothing special there.

The running environment is Windows 11, 64bit, the compilation however was done under MSYS2 MINGW64.

What could be the problem?

totszwai commented 2 months ago

Ok, I partially found a bug in unz64local_SearchCentralDir. Firstly, uMaxBack seems too small?

ZPOS64_T uMaxBack=0xffff;
...
 while (uBackRead<uMaxBack)

It never read far enough to find the Central Dir, setting it to 0xffffffffffffffff it was able to eventually locate the Central Dir.

image

Secondly, index variable of the for loop, instead of int it should probably be uLong?

int i;
...
for (i=(int)uReadSize-3; (i--)>0;)

Should be:

uLong i;
for (i=uReadSize-3; (i--)>0;)

It got a bit further, but it can still not properly extract the zip. Still troubleshooting.

totszwai commented 2 months ago

And it is failing here within the if condition, claiming that it is a UNZ_BADZIPFILE

image

Reading the comment in the code, the value of number_disk is supposed to be always 0 because it is unsupported??

image

madler commented 2 months ago

Thank you for the well-constructed, reproducible bug report. However I repeated exactly your steps in my (non-Windows) environment, and unzOpen64() did not fail. Did you try unzip -t 10g.zip to verify that the zip file is correct?

totszwai commented 2 months ago

Thank you for your fast reply.

I noticed there are new Windows-only API that were introduced in the contrib/minizip folder.

We were previously using an old version of minizip off the zlib release (v1.2.3, 13 years old?!?!?!) and was only compiling/using it for Linux (just cross-compiled for Windows. I guess it was working fine as long as the file entry wasn't larger than 4GB.)

I took the new iowin32.c and iowin32.h over and compiled them too, using similar code in your example:

    zlib_filefunc64_def ffunc;
    fill_win32_filefunc64A(&ffunc);
    uf = unzOpen2_64(filename,&ffunc);

That fixed the problem of opening the large file! But now I am unable to save the new zip file (we open the zip, add a file, then save it).

I will close this ticket and try to debug the saving zip file issue.

Thank you for your help!