jelmerk / hnswlib

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Apache License 2.0
260 stars 56 forks source link

excludeSelf parameter in hnswlib class not working #47

Closed rush4ratio closed 2 years ago

rush4ratio commented 2 years ago

When I set excludeSelf to true, it still shows the id's of self in the results. Below is a sample (converted to a pandas dataframe) from what I'm using to experiment:

image

From above, the id's are on the left while the right contains a list of tuples (id and distance as returned from hnswlib). You'll notice id's of self appearing in the results.

jelmerk commented 2 years ago

Can you double check if the id and query id have the same type?

I am pretty sure this scenario is covered by a unit test, and you can for instance also see it in action in this google colab notebook

rush4ratio commented 2 years ago

Apologies for not getting back to you sooner: It appears, if I set queryIdentifierCol to the ID column of interest, then I don't experience the problem of the self ID being included. Was this an additional purpose of queryIdentifierCol?

jelmerk commented 2 years ago

Yes you need to set the query column or it wont work.. probably i should raise an error if you use excludeSelf without also providing that as it would always be an error

rush4ratio commented 2 years ago

I agree that it's not obvious that the query column should be set for excludeSelf to work. If this requirement is not fulfilled, raising an error would help.

jelmerk commented 2 years ago

I'll see if i can add that as a safeguard tonight. and close this issue