spiraldb / vortex

A toolkit for working with compressed Arrow in-memory, on-disk, and over-the-wire
Apache License 2.0
137 stars 8 forks source link

Boolean <-> Integer duality #441

Open gatesn opened 1 month ago

gatesn commented 1 month ago

We should support converting between strict sorted integers and boolean masks. We may need an array type to go in both directions?

This could allow us to remove the RoaringUInt array

a10y commented 1 month ago

Capturing from slack:

Currently our tableprovider's pushdown is bottlenecked by take(varbin)

image

DataFusion defers to arrow's filter_bytes function to turn the predicate mask into new ArrayRef:

https://github.com/apache/arrow-rs/blob/920a94470db04722c74b599a227f930946d0da80/arrow-select/src/filter.rs#L660-L689

We want to have our own boolean builder to construct these masks, calculating run lengths, and using that to alternate between slicing/indexing in our implementation of take()