lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.97k stars 224 forks source link

Add repetition index to 2.1 format #3106

Open westonpace opened 1 week ago

westonpace commented 1 week ago

The repetition index is a general purpose structure that is used in the following situations:

The repetition index is not read during full scans. However, it is read during a partial scan of a page. The repetition index introduces "indirect I/O" back into the 2.1 format ( :melting_face: )