Open 05eee370-142f-4481-bc8b-dcb176f94e02 opened 7 years ago
The documentation for the zipfile module, https://docs.python.org/3.5/library/zipfile.html, contains inconsistent descriptions of the maximum size of a ZIP file when allowZip64 is False.
The second paragraph in the zipfile module documentation states:
"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."
Later on, in the description of the zipfile.ZipFile class, it says:
"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB."
The two sizes (4 GiB and 2 GiB) should be the same. According to https://en.wikipedia.org/wiki/Zip_(file_format)#ZIP64, 4 GiB is the correct value.
There is a similar problem in the 2.7.13 documentation, https://docs.python.org/2.7/library/zipfile.html.
New changeset 4685cd33087b by Berker Peksag in branch '3.5': Issue bpo-29013: Fix allowZip64 documentation https://hg.python.org/cpython/rev/4685cd33087b
New changeset 7c5075a14459 by Berker Peksag in branch '3.6': Issue bpo-29013: Merge from 3.5 https://hg.python.org/cpython/rev/7c5075a14459
New changeset 6ca0f3fcf82f by Berker Peksag in branch 'default': Issue bpo-29013: Merge from 3.6 https://hg.python.org/cpython/rev/6ca0f3fcf82f
Thanks for the report and for the analysis, Monte!
The documentation was correct. The zipfile module supports *reading ZIP files up to 4 GiB without the ZIP64 extension, but it requires allowZip64=True for *writing over 2 GiB files to the ZIP file.
The 2 GiB limit is safer because generated ZIP files can be read by implementations that interpret 32-bit sizes as signed. For example Java don't have unsigned integers. And zipfile and zipimport in old Python versions unpack some fields as signed integers.
Serhiy, thank you for the correction and the additional information. I tried reading a zip file larger than 4 GiB with allowZip64=False, and it worked, so it looks like allowZip64 only applies to writing. I suggest we fix the inconsistency in the documentation as follows:
(1) In the description of the zipfile.ZipFile class, revert the change back to:
"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB."
(2) Change the second paragraph in the zipfile module documentation from:
"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."
to:
"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 2 GiB in size)."
Right, that's my fault. Apparently I missed the "[...] zipfile will create [...]" part :) What do you think about Monte's suggested changes Serhiy? Should we just revert 4685cd33087b (1) or do both (1) and (2)?
I think we should just revert this. The zipfile module can handle ZIP files up to 4 GiB without the ZIP64 extensions, but it requires the ZIP64 extensions for creating ZIP files larger than 2 GiB. The ZIP64 extensions is required also for ZIP files with more than 65535 files.
Serhiy, merely reverting the change would not fix the originally reported problem in the documentation. Based on your additional information, and to prevent the need to describe the ZIP64 extensions in more than one place, I suggest two changes:
(1) Change the second paragraph in the zipfile module documentation from:
"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."
to:
"It can handle ZIP files that use the ZIP64 extensions."
(2) In the description of the zipfile.ZipFile class, change:
"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 4 GiB."
to:
"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB or contains more than 65535 files."
This would look good too.
To be accurate, zipfile will create ZIP files that use the ZIP64 extensions when:
It contains more than 65535 files.
It is larger than 2 GiB. More accurate, when either the offset or the size of the central directory is larger than 2 GiB, so in theory it is possible to exceed the total size of 2 GiB without using ZIP64.
The original size of any file is larger than 2 GiB.
I'm not sure we should describe the behavior in all details.
I agree it may be better if we don't describe all the details of ZIP64. How about this rewording for the second change, so we don't have to give all the details?
(2) In the description of the zipfile.ZipFile class, change:
"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 4 GiB."
to:
"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when necessary, for example, when the zipfile is larger than 2 GiB."
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.8', 'type-bug', '3.7', 'docs']
title = 'zipfile: inconsistent doc for ZIP64 file size'
updated_at =
user = 'https://github.com/mndavidoff'
```
bugs.python.org fields:
```python
activity =
actor = 'mndavidoff'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation']
creation =
creator = 'mndavidoff'
dependencies = []
files = []
hgrepos = []
issue_num = 29013
keywords = []
message_count = 10.0
messages = ['283597', '284448', '284449', '284469', '284475', '284519', '325728', '325963', '325974', '325980']
nosy_count = 5.0
nosy_names = ['docs@python', 'python-dev', 'berker.peksag', 'serhiy.storchaka', 'mndavidoff']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue29013'
versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']
```