lucianopaz / compress_pickle

Standard python pickle, thinly wrapped with standard compression libraries
MIT License
41 stars 10 forks source link

Support `with open` context manager #33

Closed jungerm2 closed 2 years ago

jungerm2 commented 3 years ago

Currently, the following snippet does not infer the compression scheme properly and errors:

import compress_pickle as cpickle

with open('test.lzma', 'wb') as f:    # Same issue for other extensions
    cpickle.dump([1, 2, 3], f)

Which results in the following error:

env/lib/python3.8/site-packages/compress_pickle/utils.py in instantiate_compresser(compression, path, mode, set_default_extension, **kwargs)
    110         _path = _stringyfy_path(path)
    111     if compression == "infer":
--> 112         compression = _infer_compression_from_path(_path)
    113     compresser_class = get_compresser(compression)
    114     if set_default_extension and isinstance(path, PATH_TYPES):

UnboundLocalError: local variable '_path' referenced before assignment

While the docstring says "a file-like object (io.BaseIO instances) [...] will be passed to the BaseCompresser class", this does not happen. I'd suggest grabbing the filename like so:

    if isinstance(path, PATH_TYPES):
        _path = _stringyfy_path(path)
    elif isinstance(path, io.IOBase):
        _path = path.name                          # this would set _path to 'test.lzma' in the above example
    else:
        raise RuntimeError("Unrecognized path")

    if compression == "infer":
        compression = _infer_compression_from_path(_path)

The error would simply ensure that the variable _path is defined. Something more descriptive could be added.

Any thoughts on this? I don't have time to submit a PR this week, but I'd be happy to at a later time. Thanks for this excellent library!

jungerm2 commented 3 years ago

Well, the above-suggested code doesn't quite cut it because there are some IOBase derived classes that don't have a name attribute. Actually, the tests use BytesIO extensively so the above snippet produces a bunch of AttributeError: '_io.BytesIO' object has no attribute 'name' errors.

Still, the example above with the context manager should work.

lucianopaz commented 2 years ago

Hi @jungerm2. Sorry for the extremely long time it took me to come back to compress_pickle. What was happening was that the code was failing to raise a TypeError saying that it is not possible to infer the compression protocol from a file-like object. You must always explicitly pass in the compression protocol that should be used with file-like objects and it says so in the docstring already:

If "infer", the compression method is inferred from the pathextension (only possible ifpathis aPathType``).

I'll fix the control flow to raise an appropriate TypeError and not run into the NameError anymore, and I'll also add a section Raises to the docstring making this explicit.