miurahr / py7zr

7zip in python3 with ZStandard, PPMd, LZMA2, LZMA1, Delta, BCJ, BZip2, and Deflate compressions, and AES encryption.
https://pypi.org/project/py7zr/
GNU Lesser General Public License v2.1
463 stars 74 forks source link

File extraction fails when give filename str as a `targets` argument in v0.21.0 #576

Closed huettenhain closed 8 months ago

huettenhain commented 9 months ago

Since updating to 0.21, the following happens:

Python 3.12.1 (tags/v3.12.1:2305ca5, Dec  7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import base64
>>> from py7zr import SevenZipFile
>>> from io import BytesIO
>>> data = base64.b64decode(
...     'N3q8ryccAAT9xtacpQAAAAAAAAAiAAAAAAAAAEerl+XBRlkrcJwwoqgijCyuEh0S'
...     'qLfjamv2F2vNJGFGyHDfpAAAgTMHrg/QDrA8nzkQnJ+m1TPasi6xAvSHzZaZrISL'
...     'D+EsvFULZ44Kf7Ewy47PApbKruXCaOSUsjzeqpG8VBcx66h2cV/lnGDfUjtVsyGB'
...     'HmmmTaSI/atXtuwiN5mGrqyFZTC/V2VEohWua1Yk1K+jXy+32hBwnK2clyr3rN5L'
...     'Abv5g2wXBiABCYCFAAcLAQABIwMBAQVdABAAAAyAlgoBouB4BAAA')
>>> arc = SevenZipFile(BytesIO(data), password='boom')
>>> arc.read('bar.txt') 
{}
>>> arc.read()
{'bar.txt': <_io.BytesIO object at 0x000001ECFC3A2020>, 'foo.txt': <_io.BytesIO object at 0x000001ECFEA97BA0>}
>>> arc.reset()
>>> arc.read('bar.txt') 
{}
>>> arc.read()['bar.txt'].read()
b'refinery'

The expected behaviour would be that a call to arc.read('bar.txt') returns the same reader to the contents of bar.txt that is returned by arc.read().

miurahr commented 9 months ago

Does previous version work as intended?

miurahr commented 9 months ago

There is a wrong argument for read method.

arc.read('bar.txt')

read accept list of str as argument, but you give str.

    def read(self, targets: Optional[List[str]] = None) -> Optional[Dict[str, IO[Any]]]:

It should be arc.read(['bar.txt'])

miurahr commented 9 months ago

It seems a common mistake to give str instead of List[str] for argument. A behavior is undefined when giving str for argument. Is it better to raise an exception when accepting a wrong argument?

miurahr commented 9 months ago

Here is an idea to check type and raise error when given other than list

    def read(self, targets: Optional[Collection[str]] = None) -> Optional[Dict[str, IO[Any]]]:
        if not (targets is None or isinstance(targets, list) or isinstance(targets, set)):
            raise TypeError("Wrong argument type given.")

It pass the test case as follows:

    with py7zr.SevenZipFile(BytesIO(data), password="boom") as arc:
        result = arc.read(["bar.txt"])
        assert result.get("bar.txt").read() == b"refinery"
    with py7zr.SevenZipFile(BytesIO(data), password="boom") as arc:
        result = arc.read({"bar.txt"})
        assert result.get("bar.txt").read() == b"refinery"
    with pytest.raises(TypeError):
        with py7zr.SevenZipFile(BytesIO(data), password="boom") as arc:
            arc.read("bar.txt")
    with pytest.raises(TypeError):
        with py7zr.SevenZipFile(BytesIO(data), password="boom") as arc:
            arc.read("bar.txt")

There should be consistent among methods like read and extract

huettenhain commented 9 months ago

Hey! Yes, passing str did indeed work in previous versions, that code has not changed.

But I must apologize, the type hint is actually quite clear. I think a TypeError is a good solution, that would have likely made me realize my mistake earlier. I'll note that your current check would match list | set | None rather than Collection[str] | None. For the latter, I would suggest the following type check:

def is_collection_of_str(t):
    if t is None:
        return True
    if isinstance(t, str):
        return False
    try:
        len(t)
        return all(isinstsance(x, str) for x in t)
    except Exception:
        return False

I will simply fix my code by passing a list, though, so you can close this issue whenever you want.

miurahr commented 8 months ago

But I must apologize, the type hint is actually quite clear. I think a TypeError is a good solution, that would have likely made me realize my mistake earlier. I'll note that your current check would match list | set | None rather than Collection[str] | None. For the latter, I would suggest the following type check:

I think a type check should be minimum and a check for items of collection is overkill in a common sense of python culture, and we have already provide a type hint. I think it is enough to check a common mistake, give str object as argument and it is set or list.

huettenhain commented 8 months ago

Your code, your rules. I got everything I needed, thanks a lot! 😃

miurahr commented 8 months ago

I've merged #577. Now we can close here.