samtools / htslib

C library for high-throughput sequencing data formats
Other
783 stars 447 forks source link

Linking against Intel ISAL for faster deflate operations #1780

Open pettyalex opened 1 month ago

pettyalex commented 1 month ago

Hello,

I see that bgzf compression can optionally link against libdeflate for faster performance, but I was wondering if you've ever evaluated or considered linking against the Intel ISAL? It is BSD licensed, and offers the highest performance deflate implementation that I'm aware of, especially at lower compression ratios.

Perhaps it could be optionally linked, just like libdeflate is right now? Could it be preferred to libdeflate if both are available?

https://github.com/intel/isa-l

Benchmarks: https://github.com/zlib-ng/zlib-ng/issues/1486 https://github.com/powturbo/TurboBench/issues/43

jkbonfield commented 1 month ago

Looking at those benchmarks it seems like Intel have improved their performance. I did evaluate this in the past and it was simply not a convincing win.

http://www.htslib.org/benchmarks/zlib.html

Although that was Intel's zlib rather than igzip specifically, but you would think it's the same technology in both? Maybe not.

Profiling a samtools view -1 -o /tmp/tmp.bam in.bam command to see where the CPU time is spent, I see this:

  51.73%  samtools  libdeflate.so.0     [.] deflate_compress_fastest
  13.63%  samtools  libdeflate.so.0     [.] deflate_decompress_default
  10.43%  samtools  libdeflate.so.0     [.] deflate_flush_block
   9.95%  samtools  libdeflate.so.0     [.] crc32_x86_pclmul_avx
   2.36%  samtools  [kernel]            [k] 0xffffffff99800190
   2.06%  samtools  libc-2.27.so        [.] __memmove_sse2_unaligned_erms
   1.62%  samtools  libdeflate.so.0     [.] deflate_make_huffman_code
   1.21%  samtools  samtools            [.] bgzf_read
   1.02%  samtools  samtools            [.] bgzf_write
   0.95%  samtools  samtools            [.] bam_read1

Libdeflate is already fast for decoding, but according to the benchmarks may be around half the speed of ISA-L for encoding. So with ISA-L outputting level 1 BAMs may be ~50% faster throughput. We can't tell though if this holds true on the small block sizes bgzf uses without trying it out.

pettyalex commented 1 month ago

I bring this up specifically while my group spends a lot of $ doing various bcftools tasks on cloud platforms that are spending most of their CPU time compressing output, I'm going to see if I can find some time to do this and bring you actual benchmarks on those small block sizes.

Although that was Intel's zlib rather than igzip specifically, but you would think it's the same technology in both? Maybe not.

Their newest zlib is part of Intel IPP, not ISA-L. Intel has too many competing technologies in this space, and things are ripe for confusion: https://www.intel.com/content/www/us/en/developer/articles/guide/data-compression-tuning-guide-on-xeon-systems.html

I just saw that the Intel zlib you linked isn't even the same one as the IPP maybe, but a different thing? It's kind of a nightmare: image

Intel's zlib is also not open source, which can cause some pain:

image
pettyalex commented 1 month ago

Oh, it looks like you already tested igzip and the answer was "nope", so I think that this may not be worth the effort.

I may spend the time anyway just to see it with my own eyes, but you have nice documentation about all this.