Packing 2 bits values into a bigger

lemire / simdcomp

A simple C library for compressing lists of integers using binary packing

BSD 3-Clause "New" or "Revised" License

488 stars 53 forks source link

Packing 2 bits values into a bigger #24

Closed Darshvino closed 2 years ago

Darshvino commented 2 years ago

Hi @lemire,

Thank you for all your work and all the resources you had provided.

Is there any reference to efficiently pack the 2 bits into a larger data type(basically int64)?

A = 0000.........10(64 bit), B = 0000........11(64 bit), C = 0000........01, and so on, these are int64 values containing 2-bit values.

So, I want to pack them in an int64 datatype(32 2-bit values) Z = 101101...........(int64 value, containing packed values of A, B, C, D, .............. )

I want to perform this efficiently. So It would be helpful if you can help me with some references.

Thanks in advance!

lemire commented 2 years ago

Please refer to our documentation and examples… You can pack 2-bit values using the simdcomp library as follows: simdpackwithoutmask(datain, buffer, 2).

I believe that we provide references, see our README.

If you encounter issues using simdcomp, please start by describing the work you have and be specific on the issue.

Darshvino commented 2 years ago

Hi @lemire,

Thanks a lot for your reply.

I think this will be the main reference for 2 bits packing: https://github.com/lemire/simdcomp/blob/631f8a7a27cb3398be275f0144ab9133d0a2ea4c/src/simdbitpacking.c#L113

1.) Whether there is an implementation with an AVX2 instruction set(like using 256-bit registers)?

2.) And the input is the unpacked value(A = 0000.........10(64 bit), B = 0000........11(64 bit), C = 0000........01, and so on) and the output is the packed value(Z = 101101...........(int64 value, containing packed values of A, B, C, D, .............. ), is this right with the above implementation?

Thanks

lemire commented 2 years ago

Please refer to the papers, e.g.,

Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization, Software Practice & Experience 45 (1), 2015. http://arxiv.org/abs/1209.2137 http://onlinelibrary.wiley.com/doi/10.1002/spe.2203/abstract

Darshvino commented 2 years ago

Thank you a lot @lemire. I will refer to the above references

Darshvino commented 2 years ago

Hi @lemire,

I interpreted the implementation of 2-bit packing using simdpackwithoutmask(datain, buffer, 2).

What I found wrt packing is:

If I have an Input array something like this: (3,2,1,3,1,3,2,0,1,1,2,3,2,3,3,2,0,2,0,0.....) each is of type int32 -> Then I load 4 values each into a 128 bit registers: (3,2,1,3), (1,3,2,0), (1,1,2,3), (2,3,3,2),(0,2,0,0).....................

-> And then the packing will not happen serially, instead it happens something like this: (.....0,2,1,1,3,........2,3,1,3,2,.........0,3,2,2,1,.........0,2,3,0,3......) --> Packed 2 bits data

What I mean to say is that the values in the packed data are not serial like the original.

It would be great if you can add few comments on this and give clarification on the way the data is being packed.

Thanks,