nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.12k stars 609 forks source link

Exception when number of filtered items in index is less than requested K #444

Open wskish opened 1 year ago

wskish commented 1 year ago

When using filters and requesting a K larger than the number of filtered items in the index, hnswlib will raise the following exception: "RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is too small"

Would it be possible to just return however many items are available in this case instead of raising the exception.

yurymalkov commented 1 year ago

Hi @wskish, I guess it is possible to change the behavior, but it should be clearly defined what to do in what situation and how to alert the user that something is wrong (usually that is a sign of bad hyperparameters, like ef or M or a broken index).

wskish commented 1 year ago

In the case I am seeing the ef/M/index are fine it is just that there are actually fewer than K items in the index that meet the filter criteria. So the base assumption that K nearest neighbors actually exist (after filtering) is the issue.

yurymalkov commented 1 year ago

Oh. Got it. Yeah, that needs to be fixed. One issue is that the batch search returns a numpy matrix. In case there are routinely not enough candidates, the matrix would not be filled. Maybe can add something like -1 to indicate the lack of result to keep the api intact. Will think about that.

jeffchuber commented 1 year ago

@yurymalkov +1! eg https://github.com/chroma-core/chroma/issues/225

yurymalkov commented 1 year ago

Yeah, we are gonna work on it.

danishshaikh556 commented 3 months ago

Circling back to see if we indeed fixed this issue?