Open paulbricman opened 3 years ago
It seems like it, messing with n_projections and n_hash_tables make it sometimes return results. Do you know of effective heuristics for choosing values for the two? I plan on working with 100-10000 candidate vectors of dimension 512, but was just testing with 3 of them.
Here is a presentation I have on the subject: LSH.pdf
And a notebook with some theory notebook
Most important is understanding the gap amplification. The latest plot in the notebook. You can choose K
and L
and thereby tuning the collision probability for a certain similarity value.
P.S. you can play around with the python version of this crate in the notebook:
I'm roughly using the following code:
Unfortunately, the result is empty. I'm testing the same query and documents with ngt-rs and I get some results (I'm looking for an alternative to ngt-rs which runs on windows). Is this a problem of using better parameters?