Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
We don't need a HashMap anyways. Since we're mapping from row address to partition id we can use a Vec<Vec<...>> where each lookup is map[fragment_id][row_offset]. From some experimentation this is ~6x faster.
We don't need a HashMap anyways. Since we're mapping from row address to partition id we can use a
Vec<Vec<...>>
where each lookup ismap[fragment_id][row_offset]
. From some experimentation this is ~6x faster.