spiraldb / vortex

An extensible, state-of-the-art columnar file format
https://vortex.dev
Apache License 2.0
988 stars 27 forks source link

Vortex should support zero-copy roaring int and roaring bool array construction #1075

Open danking opened 1 month ago

danking commented 1 month ago

Both RoaringIntArray and RoaringBoolArray use Bitmap::deserialize which copies the source bytes. There exists BitmapView::deserialize but it is unsafe. We either should verify the correctness of the source bytes and use the unsafe method or we should find another way to deserialize a roaring array without copying.

In my experience working with the PBI datasets, for example CMSprovider, copying roaring arrays can be ~10% of total runtime when decompressing.

robert3005 commented 1 month ago

There should be array validator that you could implement for an array then it would happen once on construction. We minimally try to deserialize metadata but that doesn't happen all the time and doesn't cover the case here. Then we could use the View types