lucianopaz / compress_pickle

Standard python pickle, thinly wrapped with standard compression libraries
MIT License
41 stars 10 forks source link

Add support for "XZ" extension #25

Closed dom-insytesys closed 3 years ago

dom-insytesys commented 3 years ago

Extension .xz seems to be standard for LZMA compression. Can you add support for this?

If I dump a variable to pickle file, then compress with command-line 'xz' utility, I can read back the compressed file with the lzma library. I create an uncompressed pickle file as follows

pickle.dump(test, open("test.pkl", "wb"))

Then from the command-line, I run:

$ xz -9v test.pkl

This converts 'test.pkl' to 'test.pkl.xz'.

I can load the compressed file with lzma library: test = pickle.load(lzma.open("test.pkl.xz", "rb"))

But if I try to do with with compress_pickle, it breaks: test = compress_pickle.load(r"C:\Users\domla\tmp\test_pkl.pkl.xz") triggers the following exception:

ValueError: Cannot infer compression protocol from filename test_pkl.pkl.xz with extension .xz

Or if I explicitly set the compression type: test = compress_pickle.load(r"C:\Users\domla\tmp\test_pkl.pkl.xz", compression='lzma')

FileNotFoundError: [Errno 2] No such file or directory: 'test_pkl.pkl.xz.lzma'

If I rename the file so that it has extension '.lzma', compress_pickle loads fine.

$ mv test.pkl.xz test.pkl.lzma

test = compress_pickle.load(r"C:\Users\domla\tmp\test.pkl.lzma", compression='lzma')

In regard to the FileNotFoundError, the behavior of compress_pickle is slightly unexpected. If I supply a filename and an compression protocol, I would expect compress_pickle to try to load the filename as-is first, before trying any funny business like sticking an extension on it. This seems obvious enough. It only gets murky if both 'test.pkl' and 'test.pkl.lzma' exist. Although even then my personal expectation is that compress_pickle should always default to filename exactly as supplied.

lucianopaz commented 3 years ago

Thanks for reporting, @dom-insytesys! I'm currently working on refactoring the package to make it easy to add extra compression protocols, filename extensions and also other pickling methods. I'll be sure to add the xz extension for lzma there.

Regarding your comment about the default extension, load and dump have a keyword argument named set_default_extension and it defaults to True. If you set it to False, then your first example will work fine.

lucianopaz commented 3 years ago

Closed by #26