python / cpython

The Python programming language
https://www.python.org
Other
63.34k stars 30.32k forks source link

zipfile: inconsistent doc for ZIP64 file size #73199

Open 05eee370-142f-4481-bc8b-dcb176f94e02 opened 7 years ago

05eee370-142f-4481-bc8b-dcb176f94e02 commented 7 years ago
BPO 29013
Nosy @berkerpeksag, @serhiy-storchaka, @mndavidoff

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-bug', '3.7', 'docs'] title = 'zipfile: inconsistent doc for ZIP64 file size' updated_at = user = 'https://github.com/mndavidoff' ``` bugs.python.org fields: ```python activity = actor = 'mndavidoff' assignee = 'docs@python' closed = False closed_date = None closer = None components = ['Documentation'] creation = creator = 'mndavidoff' dependencies = [] files = [] hgrepos = [] issue_num = 29013 keywords = [] message_count = 10.0 messages = ['283597', '284448', '284449', '284469', '284475', '284519', '325728', '325963', '325974', '325980'] nosy_count = 5.0 nosy_names = ['docs@python', 'python-dev', 'berker.peksag', 'serhiy.storchaka', 'mndavidoff'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue29013' versions = ['Python 3.6', 'Python 3.7', 'Python 3.8'] ```

05eee370-142f-4481-bc8b-dcb176f94e02 commented 7 years ago

The documentation for the zipfile module, https://docs.python.org/3.5/library/zipfile.html, contains inconsistent descriptions of the maximum size of a ZIP file when allowZip64 is False.

The second paragraph in the zipfile module documentation states:

"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."

Later on, in the description of the zipfile.ZipFile class, it says:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB."

The two sizes (4 GiB and 2 GiB) should be the same. According to https://en.wikipedia.org/wiki/Zip_(file_format)#ZIP64, 4 GiB is the correct value.

There is a similar problem in the 2.7.13 documentation, https://docs.python.org/2.7/library/zipfile.html.

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 7 years ago

New changeset 4685cd33087b by Berker Peksag in branch '3.5': Issue bpo-29013: Fix allowZip64 documentation https://hg.python.org/cpython/rev/4685cd33087b

New changeset 7c5075a14459 by Berker Peksag in branch '3.6': Issue bpo-29013: Merge from 3.5 https://hg.python.org/cpython/rev/7c5075a14459

New changeset 6ca0f3fcf82f by Berker Peksag in branch 'default': Issue bpo-29013: Merge from 3.6 https://hg.python.org/cpython/rev/6ca0f3fcf82f

berkerpeksag commented 7 years ago

Thanks for the report and for the analysis, Monte!

serhiy-storchaka commented 7 years ago

The documentation was correct. The zipfile module supports *reading ZIP files up to 4 GiB without the ZIP64 extension, but it requires allowZip64=True for *writing over 2 GiB files to the ZIP file.

The 2 GiB limit is safer because generated ZIP files can be read by implementations that interpret 32-bit sizes as signed. For example Java don't have unsigned integers. And zipfile and zipimport in old Python versions unpack some fields as signed integers.

05eee370-142f-4481-bc8b-dcb176f94e02 commented 7 years ago

Serhiy, thank you for the correction and the additional information. I tried reading a zip file larger than 4 GiB with allowZip64=False, and it worked, so it looks like allowZip64 only applies to writing. I suggest we fix the inconsistency in the documentation as follows:

(1) In the description of the zipfile.ZipFile class, revert the change back to:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB."

(2) Change the second paragraph in the zipfile module documentation from:

"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."

to:

"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 2 GiB in size)."

berkerpeksag commented 7 years ago

Right, that's my fault. Apparently I missed the "[...] zipfile will create [...]" part :) What do you think about Monte's suggested changes Serhiy? Should we just revert 4685cd33087b (1) or do both (1) and (2)?

serhiy-storchaka commented 6 years ago

I think we should just revert this. The zipfile module can handle ZIP files up to 4 GiB without the ZIP64 extensions, but it requires the ZIP64 extensions for creating ZIP files larger than 2 GiB. The ZIP64 extensions is required also for ZIP files with more than 65535 files.

05eee370-142f-4481-bc8b-dcb176f94e02 commented 6 years ago

Serhiy, merely reverting the change would not fix the originally reported problem in the documentation. Based on your additional information, and to prevent the need to describe the ZIP64 extensions in more than one place, I suggest two changes:

(1) Change the second paragraph in the zipfile module documentation from:

"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."

to:

"It can handle ZIP files that use the ZIP64 extensions."

(2) In the description of the zipfile.ZipFile class, change:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 4 GiB."

to:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB or contains more than 65535 files."

serhiy-storchaka commented 6 years ago

This would look good too.

To be accurate, zipfile will create ZIP files that use the ZIP64 extensions when:

I'm not sure we should describe the behavior in all details.

05eee370-142f-4481-bc8b-dcb176f94e02 commented 6 years ago

I agree it may be better if we don't describe all the details of ZIP64. How about this rewording for the second change, so we don't have to give all the details?

(2) In the description of the zipfile.ZipFile class, change:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 4 GiB."

to:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when necessary, for example, when the zipfile is larger than 2 GiB."