Closed deshanadesai closed 8 years ago
@deshanadesai Can you provide an example code?
@amorgun - Here is an example code:
dimension = 4096
rbp = RandomBinaryProjections('rbp', 50)
engine = Engine(dimension, lshashes=[rbp])
for index in range(0,20):
# fetch a numpy array 'arr' of shape (4096,1)
engine.store_vector(arr, 'data_%d' % index)
# Passing a query vector to search for the nearest neighbour:
N = engine.neighbours(query)
print N
Any help will be appreciated. Thanks!
@deshanadesai It looks like the length of hash is too high. Effectively you split your space into 2^50 parts and at most 20 of them contains some points. If you choose a query randomly chance to find some neighbours is at most 20 / 2^50 = 1.7763568e-14.
Some code for experiments:
import numpy as np
from nearpy import Engine
from nearpy.hashes import RandomBinaryProjections
np.random.seed(123)
dimension = 4096
rbp = RandomBinaryProjections('rbp', 5, rand_seed=20)
engine = Engine(dimension, lshashes=[rbp])
vectors = [np.random.randn(dimension) for _ in range(20)]
for index, vector in enumerate(vectors):
engine.store_vector(vector, 'data_%d' % index)
for vector in vectors:
N = len(engine.neighbours(vector))
assert N > 0
N_random = [len(engine.neighbours(np.random.randn(dimension)))
for _ in range(1000)]
print("Nonzero: %d out of 1000" % sum(i > 0 for i in N_random))
I understood this. Thank you very much @amorgun . It would be nice to have a method that selects the optimum number of binary projections. I would be willing to further read up on this and contribute to the code if you think it is worthwhile.
Hi,
I am using the example usage code snippet to hash vectors into buckets and find the nearest neighbour. I have stored 20 vectors of 4096 dimensions each in the engine. Querying the engine for the nearest neighbour of a test vector occasionally returns an empty sequence.
I tried increasing the number of random projections to 4000 with the same results. Can you help me to understand why this is happening?
Thanks! Deshana