Closed GraphicalDot closed 1 month ago
It seems that capacity information is not stored in the file, so after loading, the capacity always equals the count. As a result, an additional reserve is required before insertion, which is the issue Iām encountering. In my scenario, this introduces extra performance overhead. If there is a solution, please let me know.
I solved this issue by reserving the capacity again after loading the index.
fn load_or_create_index(session_id: &str) -> Index {
let options = IndexOptions {
dimensions: 384, // necessary for most metric kinds, should match the dimension of embeddings
metric: MetricKind::Cos, // or ::L2sq, ::Cos ...
quantization: ScalarKind::F32, // or ::F32, ::F16, ::I8, ::B1x8 ...
connectivity: 0,
expansion_add: 0,
expansion_search: 0,
multi: false,
};
let index: Index = new_index(&options).unwrap();
let home_directory = dirs::home_dir().unwrap();
let root_pyano_dir = home_directory.join(".pyano");
let pyano_data_dir = root_pyano_dir.join("indexes");
if !pyano_data_dir.exists() {
fs::create_dir_all(&pyano_data_dir).unwrap();
}
let index_name = format!("{}.usearch", session_id);
let index_path = pyano_data_dir.join(index_name);
let index_path_str = index_path.display().to_string();
match index.load(&index_path_str) {
Ok(_) => {
info!("Loaded existing index for session: {}", session_id);
}
Err(err) => {
info!("Index load failed for session: {} with error {}", session_id, err);
}
};
index.reserve(10000000);
index
}
Third last line is reserving the capacity again after loading the index. You were right @Q3g . Thanks a ton!
Describe the bug
While using rust crate for Usearch
I created an index, added multiple vectors to it, and saved it. Upon the user's request to add more code chunks, I loaded the index from disk and attempted to add more vectors, but encountered the following error:
Steps to reproduce
The embeddings are basically generated from all-MiniLM-L6-v2 LLM and is of size 384. After generating the embeddings I made an index and save it using this function defined above . add_to_index(session_id, chunks_with_compressed_data);
When I am calling this function again with the same session_id, it is loading the saved index and trying to add embeddings to it which is when I am getting this error.
"Reserve capacity ahead of insertions!"
Expected behavior
Adding embeddings to the save index should have worked like a charm.
USearch version
2.15.3
Operating System
MacOsX
Hardware architecture
Arm
Which interface are you using?
Other bindings
Contact Details
houzier.saurav@gmail.com
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct