ritchie46 / lsh-rs

Locality Sensitive Hashing in Rust with Python bindings
MIT License
109 stars 20 forks source link

How to use for vanilla K,L LSH with precomputed hashes #1

Closed paul-sud closed 4 years ago

paul-sud commented 4 years ago

Hello,

I'm curious to see if I can get this to work with plain K, L parametrized LSH. The setup is that I already have L Vecs each with K (u32) hashes of the input sample, which I'd like to be able to feed directly into the LSH. The LSH hasher in this case then only queries for exact matches in each of the L hash tables and returns the union of all matches across all tables.

Is such a setup possible using the current API?

Thanks,

Paul

ritchie46 commented 4 years ago

Hmm.. The LSH struct does need to know about your hashing function to be able to query. If you have a hashing function, you can implement the VecHash trait for that hashing function. The .hash_vec_query and .hash_vec_put functions are probably the same hashing function (they only differ for asymptotic LSH). Then you can instantiate a new LSH struct with your CustomHasher:

let mut lsh: LSH<MemoryTable, CustomHasher> = LshMem::new(n_projections, n_hash_tables, dim)

Does that help your case?

paul-sud commented 4 years ago

Thanks, that's definitely a good starting point, although my inputs are &[u32], not &[f32] per the trait. I guess I can just cast though.

ritchie46 commented 4 years ago

Yeah.. for now, I don't feel the extra complexity of generic inputs weighs up against an easy cast. :)

paul-sud commented 4 years ago

I wholeheartedly concur, I tried futzing with the source to make the trait generic and it got complicated very quickly. Thanks again for the help.

ritchie46 commented 4 years ago

In case you are still interested. After some sweat, the crate is now generic for most numeric types. https://github.com/ritchie46/lsh-rs/pull/3

paul-sud commented 4 years ago

In case you are still interested. After some sweat, the crate is now generic for most numeric types. #3

That's great news! I actually realized at some point that the particular hashing method I was using is incompatible with the LSH data structure in this crate, since the hashers can't execute independently. I do have other ideas where the generics would help though, thanks for the update.