Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
In the index_once function insider builder.rs, the greedy search is using vector of row i in graph as following.
However, if it follows the shuffled order, shouldn't it use graph.data.row(id)? I saw following robustPrune are also use id. So confused about this point, could anyone explain a bit? Thanks...
let mut ids = (0..graph.len()).collect::<Vec<_>>();
ids.shuffle(&mut rng);
for (i, &id) in ids.iter().enumerate() {
let vector = graph.data.row(i).ok_or_else(|| Error::Index {
message: format!("Cannot find vector with id {}", id),
})?;
let state = greedy_search(graph, medoid, vector, 1, l).await?;
...
In the index_once function insider builder.rs, the greedy search is using vector of row i in graph as following. However, if it follows the shuffled order, shouldn't it use graph.data.row(id)? I saw following robustPrune are also use id. So confused about this point, could anyone explain a bit? Thanks...