Closed PaulFlanaganGenscape closed 3 years ago
Thanks for reporting this @PaulFlanaganGenscape! From the PR that you linked, it looks like protocol 5 is breaking something. Could you try if compress_pickle.dump(..., protocol=4)
works?
When I find some time, I'll port the solution that the pandas team did over on their PR here.
yes, you're right. It works with protocol=4
In [61]: lb, ub = -1, 1
...: x = np.random.uniform(low=lb,high=ub,size=(1,100000000))
In [62]: humanize.naturalsize( x.nbytes )
Out[62]: '800.0 MB'
In [63]: dump(x, "x.pkl.bz", compression="bz2", protocol=4)
In [64]: dump(x, "x.pkl.bz", compression="bz2")
TypeError Traceback (most recent call last)
<ipython-input-87-5d854cdb6283> in <module>
----> 1 dump(x, "x.pkl.bz", compression="bz2")
.venv/lib/python3.9/site-packages/compress_pickle/compress_pickle.py in dump(obj, path, compression, mode, protocol, fix_imports, buffer_callback, unhandled_extensions, set_default_extension, optimize, **kwargs)
149 io_stream.write(buff)
150 else:
--> 151 pickle.dump(obj, io_stream, protocol=protocol, fix_imports=fix_imports)
152 finally:
153 io_stream.flush()
~/.pyenv/versions/3.9.0/lib/python3.9/bz2.py in write(self, data)
234 compressed = self._compressor.compress(data)
235 self._fp.write(compressed)
--> 236 self._pos += len(data)
237 return len(data)
238
TypeError: object of type 'pickle.PickleBuffer' has no len()
In [65]:
I'm having the same problem pickling a Pandas DataFrame. Switching to protocol=4 makes it work.
Closed by #26
It's a similar bug of https://bugs.python.org/issue44439 I will create an issue in Python issue tracker about this later.
I get
object of type 'pickle.PickleBuffer' has no len()
error for any compression other thangzip
if data contains a large numpy arrayIt works for small numpy arrays
I'm pretty sure it's same issue as https://github.com/pandas-dev/pandas/pull/39376
small numpy array