sstadick / gzp

Multi-threaded Compression
The Unlicense
156 stars 14 forks source link

Add Intel ISA-L #25

Open sstadick opened 2 years ago

sstadick commented 2 years ago

@ghuls have you ever done any benchmarking with ISA-L vs zlib-ng or libdeflate?

I'm looking at wrapping GKL and thinking it may be a way to get ISA-L into gzp /crabz but can't find any reliable looking benchmarks indicating of that is even worth it.

See https://www.reddit.com/r/rust/comments/qhaaju/help_wanted_opensource_genomics_project_in_rust/?utm_source=share&utm_medium=ios_app&utm_name=iossmf for the why on GKL

ghuls commented 2 years ago

@sstadick I did some benchmarking here (gzip decompression of gzip, pigz with zlib-ng and ISA-L https://github.com/zlib-ng/zlib-ng/issues/986

ISA-L was 3 times faster than zlib-ng and 6 times faster gzip decompresssion. I didn't test compression of ISA-L (as it only has 3 compression levels and compression ratio is quite low.

libdeflate is probably faster than ISA-L, but ISA-L supports streaming, whiile libdeflate doesn't. So for big files ISA-L is quite useful.

I use it here: https://github.com/aertslab/single_cell_toolkit/blob/master/extract_hydrop_atac_barcode_from_R2_fastq.sh

sstadick commented 2 years ago

Ah! I see you linked to that in hck at some point! Sorry for the repeat.

The decompression speed would be very nice to have, thanks for sharing the concrete benchmarks.

ghuls commented 1 year ago

@sstadick Any plans to still implement ISA-L? There is python-isal that wraps the ISA-L code, which might help as inspiration as I seem to remember that the API is quite different from standard zlib: https://github.com/pycompression/python-isal/blob/develop/src/isal/isal_zlibmodule.c

Some other benchmarks (from a python point of view) between different deflate libraries: https://github.com/pycompression/xopen/issues/117#issuecomment-1398510959

sstadick commented 1 year ago

I'm not currently planning on adding it, only due to personal time limitations. I do want to leave this open tough because it would be great to have.