sstadick / gzp

Multi-threaded Compression
The Unlicense
154 stars 14 forks source link

[Idea] Make gzp usable as C library and/or python module #17

Open ghuls opened 3 years ago

ghuls commented 3 years ago

Make gzp usable as C library and/or python module

For example, there is a python wrapper around ISA-Lwhich uses theigzip` code to provide fast gzip decompression/compression (bad compression ratios) which mimics the standard gzip module to provide faster gzip (de)compression speed.

python-isal

Faster zlib and gzip compatible compression and decompression by providing Python bindings for the ISA-L library.

This package provides Python bindings for the ISA-L library. The Intel(R) Intelligent Storage Acceleration Library (ISA-L) implements several key algorithms in assembly language. This includes a variety of functions to provide zlib/gzip-compatible compression.

python-isal provides the bindings by offering three modules:

    isal_zlib: A drop-in replacement for the zlib module that uses ISA-L to accelerate its performance.
    igzip: A drop-in replacement for the gzip module that uses isal_zlib instead of zlib to perform its compression and checksum tasks, which improves performance.
    igzip_lib: Provides compression functions which have full access to the API of ISA-L's compression functions.

isal_zlib and igzip are almost fully compatible with zlib and gzip from the Python standard library. There are some minor differences see: differences-with-zlib-and-gzip-modules.

https://github.com/pycompression/python-isal

It would be great if gzp could be exposed in a similar way, so more bioinformatics (python) programs can read/write gzip/bgzf files faster.

sstadick commented 3 years ago

I would very much like for both of those things to happen! If someone else starts on that before I do I'm more than happy to accommodate that / help make it happen. Especially the python library.

If I, or anyone else starts on this it'd be great to post a comment in this ticket.