spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
https://spotify.github.io/voyager/
Apache License 2.0
1.26k stars 51 forks source link

Fewer than expected results were retrieved during querying the index #38

Open sametdumankaya opened 9 months ago

sametdumankaya commented 9 months ago

Hi,

I'm trying to use voyager library instead of annoy but encountered with the following problem. Even though there are 25130 elements (see the num_elements attribute of the index below) in the Voyager Index, I'm unable to query since it can't find all of the indexes somehow.

image
markkohdev commented 8 months ago

@sametdumankaya sorry about the delayed response here! Can you provide some more information on your use case? Are you attempting to query for N neighbors where N is the number of elements in the index?

Also can you check to ensure that there are no NaN's in your item set?

cvillela commented 5 months ago

Hello @markkohdev! I am facing the exact same issue. Some calls for querying for N neighbors in an index of length N results in this error. My objective would be to find the furthest neighbor in a index from a specific vector. There are no NaN's in the set.

print(f"Len Index {len(cluster_index)}")
neighbors, _ = cluster_index.query(
            vectors=any_vector,
            k=len(cluster_index)
        )

outputs

Len Index 828
RuntimeError: Fewer than expected results were retrieved; only found 825 of 828 requested neighbors.

Is this a parameter tuning problem? Such as some of the "ef" parameters?

Please note that this index also does not contain any mark_deleted() elements

sametdumankaya commented 5 months ago

@sametdumankaya sorry about the delayed response here! Can you provide some more information on your use case? Are you attempting to query for N neighbors where N is the number of elements in the index?

Also can you check to ensure that there are no NaN's in your item set?

Hey again,

There's basically 25310 elements in the set and I'm trying to get similarity scores for all of them using a random embedding. I confirm that there's no NaN's in the item set. Somehow, 4 of the items were not included in the index and there are 25306 items in the index instead of 25310.