Closed cisaacson closed 4 years ago
The library only supports compresssing 128 integers (assuming you use BitPacker4x). You need to build your own codec on top of it.
You also need to somehow define how you will store the number of bits used to compress your block.
For instance,
[1 byte: store num bits per int for block #1]
[block #1: integers 0..128 ]
[1 byte: store num bits per int for block #2
[block #2: integers 128..256]
...
[1 byte: store num bits per int for block #n]
[block #n: integers (n-1)*128..n*128]
[last incomplete block (< 128 integers) for the reminder, using variable byte encoding]
The length of a bitpacked block packed using b bits per int is :
(16 x b) bytes
.
The following bench shows an implementation of such a codec, but is missing the "remaining variable int encoded block".
https://github.com/tantivy-search/bitpacking/blob/master/src/bitpacking_bench.rs#L95-L122
ThanksPaul for the quick response, makes sense. We will implement it this way.
Cory
-- Cory Isaacson http://www.coryisaacson.com On Jun 13, 2020, 5:50 PM -0600, Paul Masurel notifications@github.com, wrote:
The library only supports compresssing 128 integers (assuming you use BitPacker4x). You need to build your own codec on top of it. You also need to somehow define how you will store the number of bits used to compress your block. For instance, [1 byte: store num bits for block #1] [block #1: integers 0..128 ] [1 byte: store num bits for block #2 [block #2: integers 128..256] ... [1 byte: store num bits for block #n] [block #n: integers (n-1)128..n128] [last incomplete block (< 128 integers) for the reminder, using variable byte encoding] — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Paul, your advice helped a lot, we have things working the way we want with excellent compression. Great work on the library.
I have been able to compress and decompress using BitPacker4x, and the compression looks quite good on sorted u32 values. I have a large number of u32 values to compress (up to a few 1000), and to compress I looped through chunks equal to 128 u32 values in size. I need to store the final compressed result, and I did this by appending each to a result
Vec<u8>
which I can then store. This works fine.Now I need to take the result
Vec<u8>
and decompress each chunk. At this point I no longer have the compressed_len for each chunk of bytes. Is there a way to do this so that I can decompress each compressed chunk and rebuild the originalVec<u32>
? In the original library from Daniel Lemere in C++ he does support any arbitrary number of int values, that would be really nice here too. But if you can guide me on how to use it properly that will be very helpful.