unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
1.92k stars 109 forks source link

Bug: Segfault when dimensions of added vector don't add up (Rust) #412

Open jbrummack opened 1 month ago

jbrummack commented 1 month ago

Describe the bug

Hello! I am running into zsh: segmentation fault cargo run --release combined_vectors.csv, which is unexpected when using only safe Rust. Apparently my CSV had some messed up columns that had malformed data, yielding an input vector that is too short, causing a segfault. Is it a good idea to check if the dimensions add up in the rust library, outputting an error as value? Did I miss something?

Thank You!

Steps to reproduce

...
let options = IndexOptions {
    dimensions: 192,     
    metric: MetricKind::Cos,       
    quantization: ScalarKind::F32, 
    connectivity: 0,
    expansion_add: 0, 
    expansion_search: 0,
    multi: false, 
};
let row = 1;
let index: Index = new_index(&options).unwrap();
index.add(row as u64, &[3, 5]); //2 is not 192 and causes a segfault
...

Expected behavior

usearch/rust/lib.rs This is the code that is called when the crash occurs:

637 fn add(index: &Index, key: Key, vector: &[Self]) -> Result<(), cxx::Exception> {
638     index.inner.add_f32(key, vector)
639 }

1075 pub fn add<T: VectorType>(self: &Index, key: Key, vector: &[T]) -> Result<(), cxx::Exception> {
1076    T::add(self, key, vector)
1077 }

to handle the error gracefully add in another condition:

if vector.len() == self.dimensions { //avoid segfault
    T::add(self, key, vector)
} else { //error as value
    Err("dimensions dont match!")
}

USearch version

usearch = { version = "2.12.0" }

Operating System

MacOS Sonoma 14.4.1

Hardware architecture

Arm

Which interface are you using?

Other bindings

Contact Details

awnings.truces0n@icloud.com

Are you open to being tagged as a contributor?

Is there an existing issue for this?

Code of Conduct

ashvardanian commented 1 month ago

Nice! Any chance you could open a PR?

jbrummack commented 1 month ago

I will look into it tomorrow, maybe this error happens with other functions too...

jbrummack commented 1 month ago

Another reason for segfaults is trying to add to the index when its capacity is 0 (not reserving). This could also be checked in the add function.

ashvardanian commented 1 month ago

This one is harder to justify as it's not a zero-cost change. The size is an atomic variable, and checking it from many threads would cause contention.

MarkReedZ commented 1 month ago

I'm taking a look at this. Currently the rust tests are failing on add.

test tests::test_add_remove_vector ... thread 'tests::test_add_remove_vector' panicked at rust/lib.rs:1434:9:
assertion failed: index.add(id2, &second).is_ok()

I added a test for the seg fault example however this passes for me.

index.add(row as u64, &[3, 5]); //2 is not 192 and causes a segfault

I've added tests for other permutations of add and have hit a hang.

ashvardanian commented 1 month ago

@MarkReedZ, check #413 🤗

MarkReedZ commented 1 month ago

I'm looking at an unrelated error and cleaning up the tests to be more rust idiomatic.

However, looking at the PR - are you suggesting fixing this in the base library by adding a size_t length argument to index::add then checking to ensure it matches options.dimensions? Or only fixing this for rust by updating rust/lib.cpp?

ashvardanian commented 1 month ago

I meant changing only the rust/lib.cpp 🤗