thejoshwolfe / yauzl

yet another unzip library for node
MIT License
681 stars 77 forks source link

add some low level APIs #154

Closed thejoshwolfe closed 4 months ago

thejoshwolfe commented 4 months ago

Here's some of the readme additions copied into this PR for convenience:

getFileNameLowLevel(generalPurposeBitFlag, fileNameBuffer, extraFields, strictFileNames)

If you are setting decodeStrings to false, then this function can be used to decode the file name yourself. This function is effectively used internally by yauzl to populate the entry.fileName field when decodeStrings is true.

WARNING: This method of getting the file name bypasses the security checks in validateFileName(). You should call that function yourself to be sure to guard against malicious file paths.

generalPurposeBitFlag can be found on an Entry or LocalFileHeader. Only General Purpose Bit 11 is used, and only when an Info-ZIP Unicode Path Extra Field cannot be found in extraFields.

fileNameBuffer is a Buffer representing the file name field of the entry. This is entry.fileNameRaw or localFileHeader.fileName.

extraFields is the parsed extra fields array from entry.extraFields or parseExtraFields().

strictFileNames is a boolean, the same as the option of the same name in open(). When false, backslash characters (\) will be replaced with forward slash characters (/). This function always returns a string, although it may not be a valid file name. See validateFileName().

parseExtraFields(extraFieldBuffer)

This function is used internally by yauzl to compute entry.extraFields. It is exported in case you want to call it on localFileHeader.extraField.

extraFieldBuffer is a Buffer, such as localFileHeader.extraField. Returns an Array with each item in the form {id: id, data: data}, where id is a Number and data is a Buffer. Throws an Error if the data encodes an item with a size that exceeds the bounds of the buffer.

You may want to surround calls to this function with try { ... } catch (err) { ... } to handle the error.

readLocalFileHeader(entry, [options], callback)

This is a low-level function you probably don't need to call. The intended use case is either preparing to call openReadStreamLowLevel() or simply examining the content of the local file header out of curiosity or for debugging zip file structure issues.

entry is an entry obtained from Event: "entry". An entry in this library is a file's metadata from a Central Directory Header, and this function gives the corresponding redundant data in a Local File Header.

options may be omitted or null, and has the following defaults:

{
  minimal: false,
}

If minimal is false (or omitted or null), the callback receives a full LocalFileHeader. If minimal is true, the callback receives an object with a single property and no prototype {fileDataStart: fileDataStart}. For typical zipfile reading usecases, this field is the only one you need, and yauzl internally effectively uses the {minimal: true} option as part of openReadStream().

The callback receives (err, localFileHeaderOrAnObjectWithJustOneFieldDependingOnTheMinimalOption), where the type of the second parameter is described in the above discussion of the minimal option.

openReadStreamLowLevel(fileDataStart, compressedSize, relativeStart, relativeEnd, decompress, uncompressedSize, callback)

This is a low-level function available for advanced use cases. You probably want openReadStream() instead.

The intended use case for this function is calling readEntry() and readLocalFileHeader() with {minimal: true} first, and then opening the read stream at a later time, possibly after closing and reopening the entire zipfile, possibly even in a different process. The parameters are all integers and booleans, which are friendly to serialization.

This low-level function does not read any metadata from the underlying storage before opening the read stream. This is both a performance feature and a safety hazard. None of the integer parameters are bounds checked. None of the validation from openReadStream() with respect to compression and encryption is done here either. Only the bounds checks from validateEntrySizes are done, because that is part of processing the stream data.

Class: LocalFileHeader

This is a trivial class that has no methods and only the following properties. The constructor is available to call, but it doesn't do anything. See readLocalFileHeader().

See the zipfile spec for what these fields mean.

Note that unlike Class: Entry, the fileName and extraField are completely unprocessed. This notably lacks Unicode and ZIP64 handling as well as any kind of safety validation on the file name. See also parseExtraFields().

Also note that if your object is missing some of these fields, make sure to read the docs on the minimal option in readLocalFileHeader().