Open gpshead opened 9 months ago
Amending: This is not always true as implemented. _open_to_write
needs to add the initial inline zip file header as the header size depends on zip64 or not before a _ZipWriteFile
is created to fill in the data which merely seeks back and updates the same header to fill in the CRC and sizes.
To get out of the heuristic business we either need to: A. always write a zip64 header B. handle the rare boundary condition when the compressed data winds up larger than uncompressed specially by rewriting things in that situation C. give up on the zip format shenanigans, keep our heuristic, and go shopping.
investigating which approaches other zip creation tools use would be informative rather than reinventing the wheels here.
Always writing a zip64 header is inefficient for small files.
Bug report
Proposal:
Today our
zipfile
module internal implementation uses a heuristic dance to determine when a zip64 header is likely to be required betweenzipfile.ZipFile._open_to_write()
andzipfile._ZipWriteFile.close()
.This seems rather silly. Any the time
zipfile._ZipWriteFile.close()
is called, we know the real uncompressed and compressed data sizes and can deterministically decide at that time. Instead of the existing heuristic of "if the expected input file_size * 1.05 > ZIP64_LIMIT" used within_open_to_write()
today.The only time we should ever raise an exception regarding zip64 being requires is if the API user has explicitly forbidden zip64's use.
I wouldn't backport this change to a stable release as it will alter the exact output produced in some circumstances (zip64 headers will no longer be added unnecessarily in borderline cases where they were not needed), but it is fair to consider it more of a bug that removes an odd API internal implementation wart as well as a feature.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response