lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.96k stars 221 forks source link

Confusion about diskANN build process in src/index/vector/diskann/builder.rs#L276 #1212

Open AveryQi115 opened 1 year ago

AveryQi115 commented 1 year ago

In the index_once function insider builder.rs, the greedy search is using vector of row i in graph as following. However, if it follows the shuffled order, shouldn't it use graph.data.row(id)? I saw following robustPrune are also use id. So confused about this point, could anyone explain a bit? Thanks...

    let mut ids = (0..graph.len()).collect::<Vec<_>>();
    ids.shuffle(&mut rng);

    for (i, &id) in ids.iter().enumerate() {
        let vector = graph.data.row(i).ok_or_else(|| Error::Index {
            message: format!("Cannot find vector with id {}", id),
        })?;

        let state = greedy_search(graph, medoid, vector, 1, l).await?;
        ...
AveryQi115 commented 1 year ago

Link of the code: https://github.com/lancedb/lance/blob/main/rust/src/index/vector/diskann/builder.rs#L276