gtsystem / python-remotezip

Python module to access single members of a zip archive without downloading the full content from a remote web server.
MIT License
111 stars 21 forks source link

Support "recursive ZIPs" #26

Open JodanJodan opened 8 months ago

JodanJodan commented 8 months ago

I'm trying to extract two small files within a ZipFile within another ZipFile. Could this functionality be supported, or could an example be provided if already supported? ZipFile(RemoteZip(url).open(internal_zip_filename)) results in reading the entire internal ZipFile. The root ZipFile stores the internal one uncompressed, so it theoretically shouldn't need to be extracted.

Example:

Factory Images for Nexus and Pixel Devices
14.0.0 (UQ1A.240205.004, Feb 2024)

husky-uq1a.240205.004/
husky-uq1a.240205.004/radio-husky-g5300i-230927-231102-b-11040898.img
husky-uq1a.240205.004/flash-all.bat
husky-uq1a.240205.004/flash-all.sh
husky-uq1a.240205.004/image-husky-uq1a.240205.004.zip
husky-uq1a.240205.004/bootloader-husky-ripcurrent-14.1-11208047.img
husky-uq1a.240205.004/flash-base.sh

husky-uq1a.240205.004/image-husky-uq1a.240205.004.zip

android-info.txt
boot.img
init_boot.img
...
super_empty.img

I want to download/extract only boot.img and init_boot.img from the internal ZipFile.

gtsystem commented 8 months ago

Hi, the library is optimized to minimize requests, so in the moment the zip engine try to extract a member, a stream request is issued to the remote backend for the full file size. And since this is a stream of data, seeking is not really supported (seeking is needed for reading the metadata at the end of the inner zip archive). Disabling the streaming option will issue small remote reads for every sequence of bytes read by the zip library. The only solution would be to make this module aware that you are trying to work on inner zip file, however this may complicate the code base for a not so frequent use case. Feel free to work on a PR if you like, and if it's of good quality I can include it.