pixelogik / NearPy

Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes.
MIT License
759 stars 152 forks source link

multiple hash tables #47

Closed armintabari closed 8 years ago

armintabari commented 8 years ago

In LSH there are two factors that affect the accuracy, the number of bits in the code (projections) and the number of hash tables. When you create an engin like:

rbp = RandomBinaryProjections('rbp', 10) engine = Engine(dimension, lshashes=[rbp])

Does it only do one time random projection? Can we redo the hashing multiple times to increase the accuracy?

pixelogik commented 8 years ago

Yes you are right. That is why you can provide as may hashes as you want in the lshashes parameter.

engine = Engine(dimension, lshashes=[rbp1, rbp2, rbp3, rbp4, rbp5])

would use five hashes. Is that what you meant?

armintabari commented 8 years ago

So by passing 5 hashes to the engine it would run 5 times and compare them to increase accuracy? if yes, then that is exactly what I mean.