Open chebbyChefNEQ opened 6 months ago
for each new index type we add, we need to write a brand new write_*_index function.
More generally, I would like to see the methods we need to implement per index type moved into a main trait. Another example of a function we have to implement for each index is the optimize_
ones. Might be nice to write a draft of what those traits could be. Many of these are probably at the higher level (general index including scalar and FTS) rather than specific to vector indices.
This issue tracks a new iteration of vector index in lance. A few learning we want to fix in this iteration/refactor:
write_*_index
function.IVF_<sub_index>_<quantization>
This refactor will take in place in a few stages
Stage 1 -- clean up index build
Milestone 1 -- IVF Flat
Goal
QuantizationBuilder
traitIVFShuffler
traitnum_partition=1
there should be a special optimization where the centroid is omitted, unless the subindex type requires residualizationSubIndexBuilder
traitStruct
, in which we have{index_data: [...], metadata: [...]}
StructArray
and saved to a lance_file::v2 filemetadata() -> Vec<u8>
before building. This metadata will be stored as part of the schema metadata. (NOTE: this must be known before opening the file writer, dynamically determined metadata from post-build can not be stored here)lance_file::v2
file formatoptimize_index
Stage 2 -- Simplify and implement query path for new index version
TBD
Stage 3 -- Move existing index to the new builder
TBD
Stage 4 -- Delete old indexing code, old query path remains.
TBD
Stage 5 -- Fully deprecate old index format (V1 and V2)
TBD