weichsel / ZIPFoundation

Effortless ZIP Handling in Swift
MIT License
2.31k stars 255 forks source link

Adds method to allow for extracting data at a given offset. #275

Closed bitwisejb closed 10 months ago

bitwisejb commented 1 year ago

Fixes #274

Changes proposed in this PR

Tests performed

weichsel commented 1 year ago

Hi bitwisejb, Thanks for providing this PR. Can you explain your use case for this addition? It seems like you want to achieve random access into an archive file based on an entry starting position and some arbitrary offset. The focus of ZIP Foundation is to provide a structured way to access content of an archive on an per-entry basis - abstracting away the internals (offsets, lengths, compression, ...) of ZIP files. While your addition makes use of some metadata (e.g. the beginning of the entry data offset), it mainly performs low level seek/file access that could be achieved without using ZIP Foundation. API users that call into your new extract method would get back a blob of data without any context. Reading chunks of compressed entries that way wouldn't make much sense since they'd be impossible to decompress at the call site.

Would it help for your usage scenario to expose e.g. Entry.dataOffset?

bitwisejb commented 1 year ago

I believe making Entry.dataOffset public would work for my use case. The archive I am working with has map imagery stored in a folder structure designating levels and tile positions. One entry in the archive is an index for locating tile image data for a given xyz. We use xyz to determine the entry and offset for the image to extract. Byte count is known to us based on information in the index entry.

bitwisejb commented 1 year ago

@weichsel It looks like we would need the fileHandle for the archive. Would exposing Archive.archiveFile be an option as well?

bitwisejb commented 10 months ago

@weichsel We need to be able extract a single entry from a Zip file with a compression level of 0 without extracting the entire archive. The method that was originally put in place enables this functionality. We appreciate the thought and design that you have put in place that hides the lower level details. You had mentioned above that there may be a way to accomplish this without this change. What would be a good approach for accomplishing this, or is there a change you would recommend that could introduce this functionality?

weichsel commented 10 months ago

@bitwisejb

We need to be able extract a single entry from a Zip file with a compression level of 0 without extracting the entire archive.

You can subscript into an archive via path: https://github.com/weichsel/ZIPFoundation#accessing-individual-entries. This will provide you access to an entry without having to extract the whole archive first.

You had mentioned above that there may be a way to accomplish this without this change. What would be a good approach for accomplishing this, ...

After retrieving the entry, you can use the closure-based Archive.extract method: https://github.com/weichsel/ZIPFoundation#closure-based-reading-and-writing This will allow you to perform chunk-wise reads on the contents of your entry. The sample code in the README uses the basic version of this method. Please refer to the docs for more info. e.g. there's a bufferSize parameter that allows you to control the size of the data chunks passed into the closure.

bitwisejb commented 8 months ago

@weichsel We have investigated this API in the past, but it is inefficient for extracting a known set of bytes from a zip file that may be several gigabytes. Our use case requires high volume random access to well known files (offset and size) within the zip file without additional overhead. Perhaps there is another more performant api that exists that I am not aware of.

We have been using a fork of this repo with the included functions for some time with great success. We wish to contribute this back to this repo and change to using this repo so that we may benefit from any future contributions.

Please advise on what we can do to move this change forward? Otherwise, we will be left working with our fork.