Closed lmmx closed 3 years ago
The central directory at the end of a zip is marked by the signature (or "magic number") b"PK\x01\x02"
. This can be retrieved by requesting the final 500 byte range (or 300 bytes to take a chance)
After this signature (just split on this signature and then parse after it) there are various entries in the central directory struct, of which only the filename length is of interest...?
zipfile._CD_FILENAME_LENGTH
= 12, i.e. the 12'th entry following the signatureThis is better delegated to the RangeStreams library
Refer to notes on the structure of zip files
✅
To speed up the async procedure over many archives, it'll be better to seek on zip/tar.bz2 streams rather than just download all bytes and then process them synchronously
TODO: make a plain tar (no compression) and see what this looks like (apparently it's ASCII)