kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.36k stars 96 forks source link

Integer Bitpacking extensions #2083

Open benjaminwinger opened 1 year ago

benjaminwinger commented 1 year ago

From the extra TODOs in #2004, ordered roughly from simplest to most complicated.

Notes

InternalIDs

Stored as offset_t (uint64_t), but would need an extra layer to handle widening the results into the internal_id_t struct. This could cause significant slowdowns when decompressing since we wouldn't be able to use the fast mass decompression functions directly. Since the current random access implementation is not very efficient (unpacks 32 values to get one) it would probably be more performant to unpack to a temporary buffer and then copy the values into the value vector

Filling the extra space at the end of each page

This would be trivial to implement when direct reads and writes of bitpacked data is implemented.

SIMD

Mostly complicated due to handling compiler support.

Benchmarks

It would also be helpful to set up some small benchmarks (maybe using google benchmark, essentially what I did when comparing implementations at the start) so that we can do some fast comparisons of the compression/decompression speed without having to deal with the overhead of the rest of the system (not that such integrated benchmarks aren't also useful).

benjaminwinger commented 1 year ago

std::experimental::simd is available with GCC 11 and newer (and possibly clang?, searches found references in their repos, e.g., but no docs).

According to this it's planned to be merged into the standard in c++26. It might be a reasonably easy way of doing custom SIMD, e.g. for the signed extension and frame of reference calculations, with the caveat that it would only be available in certain environments (though it's likely to increase in availability as c++26 gets closer).