ritchie46 / lsh-rs

Locality Sensitive Hashing in Rust with Python bindings
MIT License
109 stars 21 forks source link

Can't obtain results using Rust implementation #13

Open paulbricman opened 3 years ago

paulbricman commented 3 years ago

I'm roughly using the following code:

let query_emb: Vec<f32>;
let doc_emb: Vec<Vec<f32>>; // contains 3 document embeddings

...

let mut lsh = LshMem::new(10, 30, 512).srp().unwrap();
let _x = lsh.store_vecs(&doc_emb[..]);
let result = lsh.query_bucket(&query_emb).unwrap();
println!("lsh-rs: {:?}", result);

Unfortunately, the result is empty. I'm testing the same query and documents with ngt-rs and I get some results (I'm looking for an alternative to ngt-rs which runs on windows). Is this a problem of using better parameters?

paulbricman commented 3 years ago

It seems like it, messing with n_projections and n_hash_tables make it sometimes return results. Do you know of effective heuristics for choosing values for the two? I plan on working with 100-10000 candidate vectors of dimension 512, but was just testing with 3 of them.

ritchie46 commented 3 years ago

Here is a presentation I have on the subject: LSH.pdf

And a notebook with some theory notebook

Most important is understanding the gap amplification. The latest plot in the notebook. You can choose K and L and thereby tuning the collision probability for a certain similarity value.

P.S. you can play around with the python version of this crate in the notebook:

https://pypi.org/project/floky/