jelmerk / hnswlib

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Apache License 2.0
260 stars 56 forks source link

Big K - output empty result #61

Closed ccalvin97 closed 1 year ago

ccalvin97 commented 1 year ago

Hi, when I set a big K=50,000, the result of the model is empty. When I set K=10,000, the result is fine. My dataset is about 0.1b row of size. My final goal is to set K=50,000 ~ 100,000.

Setting is below: hnsw = HnswSimilarity(identifierCol='ITEM_ID', queryIdentifierCol='ITEM_ID',featuresCol='EMBEDDINGS', distanceFunction='eu clidean', m=128, ef=25, k=50000,efConstruction=1000, numPartitions=9000, numReplicas=50, excludeSelf=True, similarityThreshold =0.2, predictionCol='pred')

jelmerk commented 1 year ago

you are trying to find the best 50k results for each item in the set ? that would be a huge data set. I can speculate about the problem but thats not something thats going to be easy to reproduce. And a bit of a unusual scenario

ccalvin97 commented 1 year ago

Does this model have the limitation of K? I found that when I set k=50,000. It still output empty dataframe

jelmerk commented 1 year ago

no it should never give an empty dataframe