Closed ccalvin97 closed 1 year ago
you are trying to find the best 50k results for each item in the set ? that would be a huge data set. I can speculate about the problem but thats not something thats going to be easy to reproduce. And a bit of a unusual scenario
Does this model have the limitation of K? I found that when I set k=50,000. It still output empty dataframe
no it should never give an empty dataframe
Hi, when I set a big K=50,000, the result of the model is empty. When I set K=10,000, the result is fine. My dataset is about 0.1b row of size. My final goal is to set K=50,000 ~ 100,000.
Setting is below: hnsw = HnswSimilarity(identifierCol='ITEM_ID', queryIdentifierCol='ITEM_ID',featuresCol='EMBEDDINGS', distanceFunction='eu clidean', m=128, ef=25, k=50000,efConstruction=1000, numPartitions=9000, numReplicas=50, excludeSelf=True, similarityThreshold =0.2, predictionCol='pred')