sergey-dryabzhinsky / python-zstd

Simple python bindings to Yann Collet ZSTD compression library
BSD 2-Clause "Simplified" License
165 stars 27 forks source link

Accept the same iterable types supported by zlib, lzma, bz2 modules #79

Open tasket opened 2 years ago

tasket commented 2 years ago

Problem

The lack of compatibility with certain buffer types makes this module harder to use and less efficient; one cannot do a simple drop-in replacement of the built-in compression modules.

An in-memory conversion of memoryview() and mmap() to bytes() is first required before zstd can be used on the data; this introduces code complexity (making an exception for zstd) and dramatically increases memory consumption.

>>> import zlib, zstd, mmap
>>>
>>> f   = open("my_file","r+b")
>>> mmf = mmap.mmap(f.fileno(), 0)
>>> mvf = memoryview(mmf)
>>>
>>> c = zlib.compress(mmf)
>>> c = zlib.compress(mvf)
>>>
>>> c = zstd.compress(mmf,3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument 1 must be read-only bytes-like object, not mmap.mmap
>>> c = zstd.compress(mvf,3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument 1 must be read-only bytes-like object, not memoryview

(Note that converting a memoryview to read-only has no effect here.)

Solution

Support the same buffer types that the Python built-in compression libraries accept.

tasket commented 2 years ago

data: string|bytes - input data block, length limited by 2Gb by Python API

The size limitation noted in the python-zstd Readme appears to be a side-effect of not supporting Python buffer protocol. The simple .compress() methods for the built-in modules do not have the 2GB limitation. This means a coder may have to introduce file size guard conditions specific only to python-zstd.