Turbopfor 256 performs worse than Turbopfor 128

powturbo / TurboPFor-Integer-Compression

Fastest Integer Compression

GNU General Public License v2.0

770 stars 112 forks source link

Turbopfor 256 performs worse than Turbopfor 128 #101

Closed anson304 closed 1 year ago

anson304 commented 1 year ago

Hello,

I am using Turbopfor 256 and 128 for compression of posting lists, however, I'm finding that 256 performs worse than 128 when skip lists are enabled for the posting list of your index. Do you have any ideas why this might be the case?

powturbo commented 1 year ago

This can be the case depending on the distribution of the data. It's because of the block length 256 vs 128 integers and 128 is compressing allmost always better. However, the difference is normally negligible, 256 with avx2 is faster. See the gov2 inverted index benchmark in the README. You'll have better compression when the DocIds are sorted.

anson304 commented 1 year ago

Thanks for explanation, but I don't really get what you mean by "128 is compressing allmost always better" and how this relates to the block length. I see that the default block length for your index is 128 integers. Would you be able to explain in more detail?

I noticed in idxqry.c that you use p4d1dec128v32 as the default SIMD compressor instead of p4d1dec256v32, is this because p4d1dec128v32 is faster?

powturbo commented 1 year ago

Well, the TurboPFor SSE v128 functions are using a block length of 128 integers, whereas we have 256 integers/block in the AVX2 v256 functions.

.... is this because p4d1dec128v32 is faster? No, avx2 is always faster. The idx??? programs was written at the time when the avx2 functions were not available in TurboPFor. You can adapt the idx??? programs to use avx2 v256 instead of the sse v128 functions.

anson304 commented 1 year ago

I thought that v128 compresses 128 bits which is 32 integers (4 bytes each) at a time, and v256 compresses 64 integers at a time. Doesn't that mean that v256 can be used on blocks that contain 128 integers?

powturbo commented 1 year ago

Doesn't that mean that v256 can be used on blocks that contain 128 integers? Yes.

There are low level bitpacking (bitpackv128/bitpack256) functions compressing 32 (32 bits) integers at a time. Blocks with <32 integers must be compressed with the scalar functions.

The high level bitpacking (bitnpack128/bitnpack256) functions are compressing blocks with 128/256 integers using sse/avx2. You can pass arbitrary number of integers to these functions. The bit size for each 128/256 integers is automatically determined at encoding and stored in the output buffer.

The sse v128 functions are not compatible with the v256 functions. v128 encoding <-> v128 decoding v256 encoding <-> v256 decoding