pixelogik / NearPy

Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes.
MIT License
759 stars 152 forks source link

nearest neighbour list is empty #77

Closed rajekta closed 4 years ago

rajekta commented 5 years ago

On running the example code, the list of nearest neighbours is empty. N = engine.neighbours(query)

len(N) # 0

Can you explain why is that so?

analyticsbot commented 5 years ago

I meant to ask if someone found out why the issue @rajekta exists?

I have a theory. When n_buckets (see below) is less than num_near_neighbors, I always got an empty list when querying the neighbors as demonstrated above. However, when n_buckets is at least num_near_neighbors, then the results seem to be there. Now, this is just an observation and might not have anything to do with the actual maths behind the scenes. And that is what I asked in the first line, if someone has any idea why this happens (mathematical answer).

I ended up using another library from spotify (if someone is interested)


n_buckets = 10
num_near_neighbors = 20
rbp = RandomBinaryProjections('rbp', n_buckets)
# Create engine with pipeline configuration
engine = Engine(dimension, lshashes=[rbp], distance=EuclideanDistance(), vector_filters=[NearestFilter(num_near_neighbors)],\
               storage=MemoryStorage())```

The code above will output an empty list when queried for nearest neighbors.

```# Create a random binary hash with 15 bits
n_buckets = 20
num_near_neighbors = 20
rbp = RandomBinaryProjections('rbp', n_buckets)

# Create engine with pipeline configuration
engine = Engine(dimension, lshashes=[rbp], distance=EuclideanDistance(), vector_filters=[NearestFilter(num_near_neighbors)],\
               storage=MemoryStorage())```

The code above will NOT output an empty list when queried for nearest neighbors.
pixelogik commented 5 years ago

On running the example code, the list of nearest neighbours is empty. N = engine.neighbours(query)

len(N) # 0

Can you explain why is that so?

@rajekta Could you please specify which example code your are running?

pixelogik commented 5 years ago

anyone has any answer to this? I believe this might be due to the number of buckets

rbp = RandomBinaryProjections('rbp', 10)

I noticed that when initialized, the number of bits when greater than the nearestFilter throws this issue but when less, it does not.

rbp = RandomBinaryProjections('rbp', 15)

# Create engine with pipeline configuration
engine = Engine(dimension, lshashes=[rbp], distance=EuclideanDistance(), vector_filters=[NearestFilter(20)],\
               storage=MemoryStorage())```

Could you please explain again what you are observing? The English is hard to understand, sorry.

idigitopia commented 4 years ago

I meant to ask if someone found out why the issue @rajekta exists?

I have a theory. When n_buckets (see below) is less than num_near_neighbors, I always got an empty list when querying the neighbors as demonstrated above. However, when n_buckets is at least num_near_neighbors, then the results seem to be there. Now, this is just an observation and might not have anything to do with the actual maths behind the scenes. And that is what I asked in the first line, if someone has any idea why this happens (mathematical answer).

I ended up using another library from spotify (if someone is interested)

n_buckets = 10
num_near_neighbors = 20
rbp = RandomBinaryProjections('rbp', n_buckets)
# Create engine with pipeline configuration
engine = Engine(dimension, lshashes=[rbp], distance=EuclideanDistance(), vector_filters=[NearestFilter(num_near_neighbors)],\
               storage=MemoryStorage())```

The code above will output an empty list when queried for nearest neighbors.

```# Create a random binary hash with 15 bits
n_buckets = 20
num_near_neighbors = 20
rbp = RandomBinaryProjections('rbp', n_buckets)

# Create engine with pipeline configuration
engine = Engine(dimension, lshashes=[rbp], distance=EuclideanDistance(), vector_filters=[NearestFilter(num_near_neighbors)],\
               storage=MemoryStorage())```

The code above will NOT output an empty list when queried for nearest neighbors.

could you point out to the spotify library ??

idigitopia commented 4 years ago

On running the example code, the list of nearest neighbours is empty. N = engine.neighbours(query)

len(N) # 0

Can you explain why is that so?

@rajekta Could you please specify which example code your are running?

If i understand it correctly if you are using a large dimensionality and generating random numbers to populate the bins. Because of the curse of dimensionality, it may happen that the candidate bins are all empty hence returning an empty list. Hopefully, it won't be that big of a problem with real-world datasets.