overlookmotel / yauzl-promise

Unzip library with modern Promise-based API
MIT License
32 stars 8 forks source link

End of Central Directory Record not found #45

Open ev3nst opened 8 months ago

ev3nst commented 8 months ago

I understand that this error is caused by zip files are not correctly set up and i have tested it with other archives they all work but some are giving me this error because of aforementioned problem. However those that give this error can be extracted by simply right-clicking and using "Extract" option WinRar or 7zip provides.

What im trying to build has similar apps that is not written in Javascript and they handle these zip files without error too. I want to know if there is any possibility to ignore this error and extract the files ? Because it makes no sense to me when all other options to extract the contents of an archive works but node.js packages cant.

By the way i have used 4 - 5 npm packages that deals with same thing as this package. They all give me same error.

ev3nst commented 8 months ago

I found the reasoning here;

https://github.com/thejoshwolfe/yauzl#no-streaming-unzip-api

However if anyone can recommend if even exists (non-streaming) unzip node package or solution please do so.

ayushmanchhabra commented 8 months ago

https://github.com/overlookmotel/yauzl-promise/issues/38 I came across a similar issue. I realised that the file was not being downloaded completely. For me I switched from node:https to axios and I have not got a End of Central Directory Record not found error since.

ev3nst commented 8 months ago

In my case it has something to with the archive being made incorrectly or contents of the archive are weird just like in the yauzl doc mentioned, i just dont know how. Im manually downloading the zip through browser like i said i tested multiple zips just a few ones give this error. These zips are made by people and uploaded to internet. Most problematic ones are by chinese people i dont know if there is any correlation. I checked every nook and cranny to find hidden character, there isnt.

However as everyone in the development know that end-user doesnt care about these and when user can extract the contents of a zip file by simply right clicking and selecting the "Extract" option provided by WinRar or 7zip or other tools to achieve this process as a developer i cannot say that the zip file doesnt contain correct meta information. I just have to make this work.

ayushmanchhabra commented 8 months ago

Could you share some code?

This is what I did in nw-builder and it worked for me (the unzip function is a bit verbose since also handling symlinks):

async function request(url, filePath) {

  const writeStream = fs.createWriteStream(filePath);

  const response = await axios({
    method: "get",
    url: url,
    responseType: "stream"
  });

  await stream.promises.pipeline(response.data, writeStream);
}

async function decompress(filePath, cacheDir) {
  if (filePath.endsWith(".zip")) {
    await unzip(filePath, cacheDir);
  } else {
    await tar.extract({
      file: filePath,
      C: cacheDir
    });
  }
}

function modeFromEntry(entry) {
  const attr = entry.externalFileAttributes >> 16 || 33188;

  return [448 /* S_IRWXU */, 56 /* S_IRWXG */, 7 /* S_IRWXO */]
    .map(mask => attr & mask)
    .reduce((a, b) => a + b, attr & 61440 /* S_IFMT */);
}

async function unzip(zippedFile, cacheDir) {
  const zip = await yauzl.open(zippedFile);
  let entry = await zip.readEntry();
  const symlinks = []; // Array to hold symbolic link entries

  while (entry !== null) {
    let entryPathAbs = path.join(cacheDir, entry.filename);
    const isSymlink = ((modeFromEntry(entry) & 0o170000) === 0o120000);

    if (isSymlink) {
      symlinks.push(entry);
    } else {
      await fs.promises.mkdir(path.dirname(entryPathAbs), {recursive: true});
      if (!entry.filename.endsWith('/')) { // Skip directories
        const readStream = await entry.openReadStream();
        const writeStream = fs.createWriteStream(entryPathAbs);
        await stream.promises.pipeline(readStream, writeStream);

        const mode = modeFromEntry(entry);
        await fs.promises.chmod(entryPathAbs, mode);
      }
    }

    entry = await zip.readEntry();
  }

  for (const symlinkEntry of symlinks) {
    let entryPathAbs = path.join(cacheDir, symlinkEntry.filename);
    const readStream = await symlinkEntry.openReadStream();
    const chunks = [];
    readStream.on("data", (chunk) => chunks.push(chunk));
    await new Promise(resolve => readStream.on("end", resolve));
    const linkTarget = Buffer.concat(chunks).toString('utf8').trim();

    if (fs.existsSync(entryPathAbs)) {
      //skip
    } else {
      await fs.promises.symlink(linkTarget, entryPathAbs);
    }
  }
}
overlookmotel commented 8 months ago

I suspect that the problem you're having is that the ZIP files are not properly formed.

yauzl-promise (like the original yauzl) is a spec-compliant unzip implementation. As the article you mentioned explains, a properly-formed ZIP file contains a "Central Directory" at the end of the ZIP which catalogues all the contents of the ZIP, and this is what yauzl/yauzl-promise uses to read the contents of the ZIP.

ZIP files also contain redundant "local file headers" before each file in the ZIP, so it is possible to read a ZIP from start to end reading these headers (which is probably what 7zip etc are doing).

However, there are 2 problems with this:

  1. It does not allow random access to the ZIP's contents. If you only want to read the 50th file in a ZIP, you have to read through files 1-49 first.
  2. In some rare cases, it can result in failure to read a properly-formed ZIP (if the contents of one of the files in the ZIP contains what looks like a ZIP file header).

For these reasons, yauzl-promise only supports reading ZIP files which are properly formed according to the spec (except for faulty MacOS ZIPs, which are malformed, but in a specific way that yauzl-promise can recognise). I don't intend to change that, due to the above problems.

You may be able to find another unzip library which does non-compliant streaming unzip, which you could use as fallback in case of the "End of Central Directory record not found error".

However... the fact that you're seeing this only with ZIP files originating from China may indicate that there's something else going on here, and there's a bug in yauzl-promise relating to character encoding or something. I think that's unlikely, but would be willing to investigate and make sure.

Would you be able to post a link to one of the ZIP files which you're getting this error on? Or, if the data is private so not appropriate to post a public link, if you're willing to share with me privately, you could email it to me at theoverlookmotel@gmail.com.

ev3nst commented 8 months ago

I investigated further and found that the problem is when an archive contains another archive.

zip in zip

if i add a single empty .txt file code runs without error.

i must say that the type or format of the archive does not matter. my live application test contained archived with an extension of .pack (a mod file for a game) and its same.

example;

test.7z -- test.pack

ERROR

test.7z -- test.pack -- test.txt (empty)

SUCCESS

overlookmotel commented 8 months ago

Ah OK. Thanks for continuing to investigate. This does sound like it may be a bug.

Can you please post a link to (or send me privately at theoverlookmotel@gmail.com) a repro case containing:

  1. A ZIP file which yauzl-promise can't handle.
  2. The files which were used to create that ZIP (i.e. test.pack)

What software was the ZIP created with? (the ZIP yauzl-promise can't unzip)

NB: yauzl-promise only supports ZIP files. If the "ZIP" file you're trying to use yauzl-promise on is actually a 7Z file, I believe that's a different format from ZIP, so yauzl-promise will likely not be able to unzip it, no matter what the contents (or it's an accident if it can in some cases).

ev3nst commented 8 months ago

First i want to thank you for your attention to this issue. I fixed my unarchiving problem with node-7z package. As a feedback all i can say is that zip content is mod files downloaded directly from steam workshop and the extension of the zips are just .zip not .7z i mistyped earlier in my comments. (https://steamcommunity.com/sharedfiles/filedetails/?id=2858135182)

i think its time for me to close this issue.

overlookmotel commented 8 months ago

@ev3nst I'm not completely clear, but it did sound like maybe you found a bug in yauzl-promise. If that's the case, I'd like to reproduce and fix it!

When someone else encounters this, telling them "use another library like node-7z" wouldn't be a very satisfying conclusion!

Can I ask, would you have time to send me a repro case?

ev3nst commented 8 months ago

I have been working on my project as a hobby and i must say im kinda burnt out, it took a lot more time than i imagined sorry.