spiraldb / vortex

An extensible, state-of-the-art columnar file format
https://vortex.dev
Apache License 2.0
1.01k stars 28 forks source link

Boolean <-> Integer duality #441

Open gatesn opened 4 months ago

gatesn commented 4 months ago

We should support converting between strict sorted integers and boolean masks. We may need an array type to go in both directions?

This could allow us to remove the RoaringUInt array

a10y commented 4 months ago

Capturing from slack:

Currently our tableprovider's pushdown is bottlenecked by take(varbin)

image

DataFusion defers to arrow's filter_bytes function to turn the predicate mask into new ArrayRef:

https://github.com/apache/arrow-rs/blob/920a94470db04722c74b599a227f930946d0da80/arrow-select/src/filter.rs#L660-L689

We want to have our own boolean builder to construct these masks, calculating run lengths, and using that to alternate between slicing/indexing in our implementation of take()