python / cpython

The Python programming language
https://www.python.org
Other
62.17k stars 29.88k forks source link

is_zipfile does not set the position to the file header #122356

Open 0xMattina opened 1 month ago

0xMattina commented 1 month ago

https://github.com/python/cpython/blob/4e7550934941050f54c86338cd5e40cd565ceaf2/Lib/zipfile/__init__.py#L243-L244

When filename is BytesIO, is_zipfile don't set the position to the file header. This will affect other codes.

And it's my first issues,i don't known how to report the question,sorry

Linked PRs

terryjreedy commented 1 month ago

Can you add test code that demonstrates the claimed bug? That prints a result that you claim is buggy (different from correct)?

0xMattina commented 1 month ago

CPython versions tested on:

3.12

Operating systems tested on:

Windows

Test code:

import zipfile
from io import BytesIO

if __name__ == '__main__':
    with open(r"d:\test.zip", "rb") as fp_file:
        fp_io = BytesIO(fp_file.read())
    print("before pos:", fp_io.tell())
    print("is_zipfile:", zipfile.is_zipfile(fp_io))
    print("after pos:", fp_io.tell())

Output:

before pos: 0
is_zipfile: True
after pos: 122
ronaldoussoren commented 1 month ago

The documentation for is_zipfile does not promise anything about the file position after the call. That makes this a feature request.

What would be the advantage of ensuring that the file position is at a particular place in the file after calling this function?

0xMattina commented 1 month ago

Test code:

import tarfile
import zipfile
from io import BytesIO

if __name__ == '__main__':
    with open(r"d:\1.zip", "rb") as fp_file:
        fp_io = BytesIO(fp_file.read())

    print("before <is_tarfile> pos:", fp_io.tell())
    print("<is_tarfile> result:", tarfile.is_tarfile(fp_io))
    print("after <is_tarfile> pos:", fp_io.tell())
    print()
    print("before <is_zipfile> pos:", fp_io.tell())
    print("<is_zipfile> result:", zipfile.is_zipfile(fp_io))
    print("after <is_zipfile> pos:", fp_io.tell())

Output:

before <is_tarfile> pos: 0
<is_tarfile> result: False
after <is_tarfile> pos: 0

before <is_zipfile> pos: 0
<is_zipfile> result: True
after <is_zipfile> pos: 122

I tried some other libraries, and tarfile is a std library, it will set pos to the value before call. I think if zipfile does the same, it will improve the logic and consistency of the code.

picnixz commented 1 month ago

For tarfile, it was fixed in https://github.com/python/cpython/pull/26488 so I think we should also fix it for zipfile (but I'll mark it as a feature because the tarfile one was not backported to 3.9 or 3.10, so I think it should be the same for this one).

0xMattina commented 1 month ago

Yes,that's what i want.Thanks!

picnixz commented 1 month ago

By the way, we should have a look at https://github.com/python/cpython/issues/72680 and https://github.com/python/cpython/pull/5053 as well since this seems related (perhaps we should first consider fixing the false positives before fixing the position).