nanoporetech / vbz_compression

VBZ compression plugin for nanopore signal data
https://nanoporetech.com/
Mozilla Public License 2.0
39 stars 9 forks source link

Benchmark: Nanopore signal data compression #3

Closed powturbo closed 4 years ago

powturbo commented 5 years ago

Testing TurboPFor, I've made some experiments with nanopore signal data. File: all 16-bits signal data extracted from multi_fast5_zip.fast5

         ./icapp sig.u16 -Elzturbo,39 -Fs -e83,77,11,39,51,30,76
file: max bits histogram:
04: 0.000% 
06: 0.000% 
07: 0.002% 
08:###### 6.0% 
09:######################################################################################### 89% 
10:##### 5.5% 
11: 0.000% 
12: 0.000% 
16: 0.001% 

file: delta max bits histogram:
00:## 2.4%
01:## 2.4%
02:##### 4.7%
03:######### 9.2%
04:################# 17%
05:########################### 27%
06:######################## 24%
07:######### 9.0%
08:#### 3.7%
09:# 0.8%
10: 0.012%
11: 0.002%
12: 0.000%
13: 0.000%
14: 0.000%

Filesize: 3.097.862 bytes    CPU: Skylake i7-6700  3.4GHz

  E MB/s     size     ratio   D MB/s  function integer size=16 bits (lz=lzturbo,39) 
   78.59    1284821  41.47%  1364.10 LztpzByte        Transpose+zzag+turboanx
    1.13    1289797  41.64%  1912.47 LztpzByte        Transpose+zzag+lzturbo,39      
    3.06    1292571  41.72%  1627.03 LztpzByte        Transpose+zzag+zstd,22        
  120.73    1293228  41.75%  2010.29 lzv8zenc         TurboByte+zzag+turboanx     
    7.32    1296883  41.86%  1946.86 lzv8zenc         TurboByte+zzag+lzturbo,39     
    7.48    1310074  42.29%  1690.97 lzv8zenc         TurboByte+zzag+zstd,22    
    6.44    1333780  43.05%  1103.62 vbz              vbz_compression 
  614.78    1432663  46.25%  4419.20 p4nzenc128v16    TurboPForV   zigzag     
  547.91    1523046  49.16%   752.46 lzv8zenc         TurboByte+zzag+fse        
    1.55    1577260  50.91%  5779.59 LztpzByte        Transpose+zzag+lzturbo,19      
   11.68    1577173  50.91%  1789.64 LztpzByte        Transpose+zzag+lz4,12     
   10.80    1583766  51.12%  6387.34 lzv8zenc         TurboByte+zzag+lzturbo,19      
   21.06    1587076  51.23%  4863.21 lzv8zenc         TurboByte+zzag+lz4,12     
  367.44    1632853  52.71%   573.68 LztpzByte        Transpose+zzag+fse        
 6749.15    1676704  54.12%  8775.81 v8nzenc128v16    TByte+TPackV zigzag     
 6992.92    1676746  54.13%  8927.56 bitnzpack128v16  TurboPackV   zigzag     
   68.73    1705144  55.04%   959.98 lzv8enc          TurboByte+turboanx            
iiSeymour commented 5 years ago

Hey @powturbo, this is great, thanks for taking an interest.

So it looks like LztpzByte - Transpose+zzag+turboanx has better compression and the decoding performance is very good. Is it possible to compile VBZ into the benchmark for a direct comparison?

powturbo commented 5 years ago

I've included now the vbz result (vbz included in icapp). The overhead to the native 16 bits 'lzv8zenc' is 1,79% .

0x55555555 commented 5 years ago

Hi @powturbo I am trying to integrate some further benchmarks into the test suite and am struggling to find the source for TurboByte+zzag+turboanx (specifically turboanx) - is this available publicly?

The attached link (https://sites.google.com/site/powturbo/entropy-coder) doesn't seem to have any source listed.

powturbo commented 5 years ago

Very interesting! TurboByte is included in the TurboPFor Integer Compression. TurboANX (SIMD Asymmetric Numeral System) similar to FSE but more efficient and 3x faster, is closed source. There are actually no plan to open the sources. You can see an updated version in this 16 bits benchmark.