lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.98k stars 227 forks source link

Vector Index V3 (0.3?) #2373

Open chebbyChefNEQ opened 6 months ago

chebbyChefNEQ commented 6 months ago

This issue tracks a new iteration of vector index in lance. A few learning we want to fix in this iteration/refactor:

This refactor will take in place in a few stages

Stage 1 -- clean up index build

Milestone 1 -- IVF Flat

Goal

Stage 2 -- Simplify and implement query path for new index version

TBD

Stage 3 -- Move existing index to the new builder

TBD

Stage 4 -- Delete old indexing code, old query path remains.

TBD

Stage 5 -- Fully deprecate old index format (V1 and V2)

TBD

wjones127 commented 6 months ago

for each new index type we add, we need to write a brand new write_*_index function.

More generally, I would like to see the methods we need to implement per index type moved into a main trait. Another example of a function we have to implement for each index is the optimize_ ones. Might be nice to write a draft of what those traits could be. Many of these are probably at the higher level (general index including scalar and FTS) rather than specific to vector indices.