thejoshwolfe / yauzl

yet another unzip library for node
MIT License
737 stars 80 forks source link

Get the correct fileName from extra filed when decodeStrings is false #113

Closed fpsqdb closed 9 months ago

fpsqdb commented 5 years ago

This commit add option decodeStrings adds support return fileName or comment as buffer and fixes issue #42, but the fileName is not a buffer when decodeStrings is false. This PR makes fileName is a buffer when decodeStrings is false.

thejoshwolfe commented 9 months ago

Sorry for the delayed response. I'm not sure I understand the purpose or intended effect of this PR. Are you trying to bypass the security validation but still support reading the Info-ZIP Unicode Path Extra Field? If that's the case, what part of the validation is causing issues for you?

fpsqdb commented 9 months ago

@thejoshwolfe Sorry, the commit link and related issue is wrrong, i have modified my comment.

thejoshwolfe commented 9 months ago

Why do you want an undecoded buffer for the file name?

fpsqdb commented 9 months ago

Set decodeStrings to false to decode the buffer by myself. And the code implementation does not match the document description. https://github.com/thejoshwolfe/yauzl#filename

If decodeStrings is false (see open()), this field is the undecoded Buffer instead of a decoded String.

thejoshwolfe commented 9 months ago

I've just released yauzl 3.1.0, which includes support for decoding file names in UTF-8 without the safety validation. But it sounds like that's not actually what you're looking for.

It sounds like what you're looking for is:

  1. Ignore General Purpose Bit 11.
  2. Support finding the Info-ZIP Unicode Path Extra Field in the extra fields, and perform the version check and crc32 verification as required, but don't convert the Buffer into a string using UTF-8.
  3. return either the basic fileName as a Buffer or the override filename from the Info-ZIP Unicode Path Extra Field as a Buffer if present.

Is that what you want? If so, ... I'm very curious why. Have you found zip files using the Info-ZIP Unicode Path Extra Field that use an encoding other than UTF-8? Or are you curious what the bytes were before the UTF-8 decoding? If that's all you want, you should be able to simply re-encode the value into UTF-8 (UTF-8 is bijective for non-error code points).

In any case, what you're looking for can be accomplished by copying the logic in yauzl, which is now located in getFileNameLowLevel(). It's only about 30 lines of code.

Unless I can understand the use case for this PR, I can't properly support it.

thejoshwolfe commented 9 months ago

And the code implementation does not match the document description.

What's the discrepancy that you're seeing? If you're talking about how the undecoded Buffer is always the basic name and never the one from the Info-ZIP Unicode Path Extra Field, that's mentioned in the very next sentence in the docs. Maybe that could be communicated more clearly.

fpsqdb commented 9 months ago

The latest version has fixed this problem