python / cpython

The Python programming language
https://www.python.org
Other
61.81k stars 29.71k forks source link

Wrong documentation for GzipFile.peek #72631

Open 179a554f-eaff-458f-ab05-8166251cacbd opened 7 years ago

179a554f-eaff-458f-ab05-8166251cacbd commented 7 years ago
BPO 28445
Nosy @vadmium, @zhangyangyu, @180909
PRs
  • python/cpython#29820
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['easy', 'type-bug', '3.9', '3.10', '3.11', 'docs'] title = 'Wrong documentation for GzipFile.peek' updated_at = user = 'https://bugs.python.org/abacabadabacaba' ``` bugs.python.org fields: ```python activity = actor = 'wangjiahua' assignee = 'docs@python' closed = False closed_date = None closer = None components = ['Documentation'] creation = creator = 'abacabadabacaba' dependencies = [] files = [] hgrepos = [] issue_num = 28445 keywords = ['patch', 'easy', 'newcomer friendly'] message_count = 3.0 messages = ['278656', '278660', '278671'] nosy_count = 5.0 nosy_names = ['abacabadabacaba', 'docs@python', 'martin.panter', 'xiang.zhang', 'wangjiahua'] pr_nums = ['29820'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue28445' versions = ['Python 3.9', 'Python 3.10', 'Python 3.11'] ```

    179a554f-eaff-458f-ab05-8166251cacbd commented 7 years ago

    From the documentation for GzipFile.peek():

    At most one single read on the compressed stream is done to satisfy the call.

    If "compressed stream" means the underlying file object, then this is not true. The method tries to return at least one byte, unless the stream is at EOF. It is possible to create arbitrarily long compressed stream that would decompress to nothing, and the implementation would read the entire stream in this case. Because the length of the stream is not known in advance, several reads may be required for this.

    Perhaps the documentation for GzipFile.peek() should be made the same as that for BZ2File.peek() and LZMAFile.peek().

    zhangyangyu commented 7 years ago

    The "compressed stream" is not the underlying file object but _GzipReader. And actually the "at most one single reader" is the characteristic of io.BufferedReader.peek, you can see it in the doc. Maybe it needs multiple reads on the file object in a single peek, but they are all encapsulated in the _GzipReader.read. So at the point of GzipFile.peek, it's still a single read.

    vadmium commented 7 years ago

    The peek() method was originally added by bpo-9962, where Antoine was trying to imitate the BufferedReader.peek() API. However because “the number of bytes returned may be more or less than requested”, I never understood what this methods were good for; see also bpo-5811.

    I think we could at least remove the claim about “at most one single read”. That is just describing an internal detail.

    The documentation for bzip and LZMA is slightly more useful IMO because it says “at least one byte of data will be returned, unless EOF has been reached”. This guarantee is actually missing from the underlying BufferedReader.peek() documentation, though I think both io and _pyio implement it.