Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
Tracking the I/O of a scalar index search we can see that the search is loading the dataset metadata (it appears to do so twice). That data should already be cached. Loading it can be quite costly and defeat the purpose of doing an indexed search in the first place. This gets even worse when there are many fragments in a dataset because the manifest is quite large.
Tracking the I/O of a scalar index search we can see that the search is loading the dataset metadata (it appears to do so twice). That data should already be cached. Loading it can be quite costly and defeat the purpose of doing an indexed search in the first place. This gets even worse when there are many fragments in a dataset because the manifest is quite large.