nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.11k stars 607 forks source link

Why is the self point not in the knn_query output list? #526

Open yazhinia opened 7 months ago

yazhinia commented 7 months ago

Hello, I want to get the nearest neighbours for each point in the dataset. I notice from the results of knn_query that the neighbour list for some points does not contain their own index while most points have their own index as the first nearest neighbour. My code is given below

p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)
p.set_ef(10) 
p.set_num_threads(12) 
p.add_items(input_2d_vector)
nn = 15 
labels, distance = p.knn_query(input_2d_vector, k=nn)

Example: result of a query index 159954

p.knn_query(input_2d_vector[159954], k=nn)
(array([[100278,  98287,  56307,  91682, 106717, 108968,  35750, 116215,
         133216, 108053,  50169, 138988,  23028,  23627, 127306]],
       dtype=uint64),
 array([[624.35284, 628.1655 , 643.80225, 646.7606 , 649.9992 , 658.15686,
         659.7333 , 659.9536 , 660.69086, 662.9651 , 667.3906 , 670.5628 ,
         684.00586, 686.38666, 688.42053]], dtype=float32))

Ideally, the first index should be 159954 instead of 100278 given that the self distance of 159954 is zero. Is it an expected behavior or am I missing something?

Any help is much appreciated.

Thank you.