thejoshwolfe / yauzl

yet another unzip library for node
MIT License
681 stars 77 forks source link

Cannot parse zip file containing 65535 files, or with a central directory offset of 0xffffffff, if not in Zip64 format #108

Closed AxbB36 closed 3 months ago

AxbB36 commented 5 years ago

Create ffff.zip containing 65535 files as follows:

$ seq 1 65535 | while read n; do touch -d '2019-05-01 00:00:00 UTC' $(printf %04x $n); done
$ TZ=UTC zip -X ffff.zip $(seq 1 65535 | while read n; do printf "%04x\n" $n; done)

UnZip 6.0 can parse it:

$ unzip -l ffff.zip | tail -n 3
        0  2019-05-01 00:00   ffff
---------                     -------
        0                     65535 files

But this yauzl program cannot:

let yauzl = require("yauzl");
yauzl.open(process.argv[2], {lazyEntries: true}, (err, zipfile) => {
    if (err)
        throw err;
    zipfile.on("entry", entry => {
        zipfile.openReadStream(entry, (err, r) => {
            if (err)
                throw err;
            let n = 0;
            r.on("data", chunk => n += chunk.length);
            r.on("end", () => {
                console.log(`${n}\t${entry.fileName}`);
                zipfile.readEntry();
            });
        });
    });
    zipfile.readEntry();
});

The error message is:

$ node ziplist.js ffff.zip
ziplist.js:4
        throw err;
        ^

Error: invalid zip64 end of central directory locator signature
    at node_modules/yauzl/index.js:154:27
    at node_modules/yauzl/index.js:631:5
    at node_modules/fd-slicer/index.js:32:7
    at FSReqWrap.wrapper [as oncomplete] (fs.js:658:17)

yauzl interprets an entryCount of 0xffff (or a centralDirectoryOffset of 0xffffffff) to mean that a Zip64 end of central directory locator must be present: https://github.com/thejoshwolfe/yauzl/blob/02a5ca69c7713f6d2897cc02f2acc1df21093e3d/index.js#L140-L142

APPNOTE.TXT seems to say that the implication goes the other way: instead of 0xffff ⇒ Zip64, it is Zip64 ⇒ 0xffff; i.e., a value of 0xffff does not necessarily imply that Zip64 information must be present.

4.4.1.4 If one of the fields in the end of central directory record is too small to hold required data, the field SHOULD be set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record SHOULD be created.

How some other implementations handle it

UnZip searches for a zip64 end of central directory locator unconditionally (whether or not there is a 0xffff or 0xffffffff), and does not error if the locator is not found. process.c:find_ecrec:

    /* Next: Check for existence of Zip64 end-of-cent-dir locator
       ECLOC64. This structure must reside on the same volume as the
       classic ECREC, at exactly (ECLOC64_SIZE+4) bytes in front
       of the ECREC.
       The ECLOC64 structure directs to the longer ECREC64 structure
       A ECREC64 will ALWAYS exist for a proper Zip64 archive, as
       the "Version Needed To Extract" field is required to be set
       to 4.5 or higher whenever any Zip64 features are used anywhere
       in the archive, so just check for that to see if this is a
       Zip64 archive.
     */
    result = find_ecrec64(__G__ searchlen+76);
        /* 76 bytes for zip64ec & zip64 locator */
    if (result != PK_COOL) {
        if (error_in_archive < result)
            error_in_archive = result;
        return error_in_archive;
    }

process.c:find_ecrec64:

    if (memcmp((char *)byterecL, end_centloc64_sig, 4) ) {
      /* not found */
      return PK_COOL;
    }

Python zipfile also searches for a zip64 end of central directory locator unconditionally, and does not error if it does not find the expected signature: https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L258-L259 https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L282-L284 https://github.com/python/cpython/blob/v3.7.0/Lib/zipfile.py#L197-L202

    data = fpin.read(sizeEndCentDir64Locator)
    if len(data) != sizeEndCentDir64Locator:
        return endrec
    sig, diskno, reloff, disks = struct.unpack(structEndArchive64Locator, data)
    if sig != stringEndArchive64Locator:
        return endrec

Go archive/zip searches for a zip64 end of central directory locator only if entryCount is 0xffff, or centralDirectoryOffset is 0xffffffff, or the central directory size is 0xffffffff. It doesn't error if the locator is not found. https://github.com/golang/go/blob/go1.12.4/src/archive/zip/reader.go#L502-L511

    // These values mean that the file can be a zip64 file
    if d.directoryRecords == 0xffff || d.directorySize == 0xffff || d.directoryOffset == 0xffffffff {
        p, err := findDirectory64End(r, directoryEndOffset)
        if err == nil && p >= 0 {
            err = readDirectory64End(r, p, d)
        }
        if err != nil {
            return nil, err
        }
    }
thejoshwolfe commented 5 years ago

Thanks for the detailed report! I'll take a look.

AxbB36 commented 5 years ago

This issue also affects zip files that have a central directory offset of 0xffffffff. Here is a recipe to make a test case for that.

ffffffff-centralDirectoryOffset.zip.gz.gz (remove 2 layers of gzip)

# 216186 * 19867 = 0xffffffff - len("pad") - 30
dd if=/dev/zero bs=216186 count=19867 of=pad
touch -d '2019-05-01 00:00:00 UTC' pad
rm -f ffffffff-centralDirectoryOffset.zip
TZ=UTC zip -0 -X ffffffff-centralDirectoryOffset.zip pad

zipinfo -v says:

  The central directory is 49 (0000000000000031h) bytes long,
  and its (expected) offset in bytes from the beginning of the zipfile
  is 4294967295 (00000000FFFFFFFFh).
thejoshwolfe commented 3 months ago

@AxbB36 This is fixed in yauzl version 3.1.1. I didn't make an automated test for this, because creating performant tests for large numbers is pretty difficult (see test/zip64.js), but i manually verified the test case you outlined in the OP works with the examples/dump.js example.

thejoshwolfe commented 3 months ago

This issue also affects zip files that have a central directory offset of 0xffffffff

Oh, I may not have fixed this issue. Are you getting an error expected zip64 extended information extra field?

thejoshwolfe commented 3 months ago

Oh, I may not have fixed this issue.

Ok, I fixed the entry handling as well in version 3.1.2. I think this issue is fully fixed now.