Closed yixiangfeng closed 3 months ago
To address the issue with the discrepancy in similarity score interpretation between Milvus and the ScoreThresholdRetriever
in LangChain.js, where Milvus interprets a score of 0 as maximum similarity and 1 as maximum dissimilarity, you can adjust the minSimilarityScore
parameter in the ScoreThresholdRetriever
to align with Milvus's scoring. This involves setting minSimilarityScore
to a value close to 0 to include more similar results.
For a more technical solution, consider preprocessing or transforming the scores from Milvus before they are evaluated by the ScoreThresholdRetriever
. This could involve modifying the similaritySearchWithScore
method to invert the scores, for example, by subtracting them from 1, to align with the expected scoring system of the ScoreThresholdRetriever
.
Here's a conceptual example of how you might adjust the score inversion:
const results: [Document, number][] = [];
searchResp.results.forEach((result) => {
// Invert the similarity score to align with ScoreThresholdRetriever expectations
const invertedScore = 1 - result.score;
results.push([new Document(fields), invertedScore]);
});
This approach assumes scores are normalized between 0 and 1. If Milvus uses a different scoring range, you'll need to adjust the inversion logic accordingly.
Alternatively, you could extend or modify the ScoreThresholdRetriever
to interpret scores from Milvus correctly, by prioritizing lower scores as indicating higher similarity. This might involve changing the filtering logic within the ScoreThresholdRetriever
.
For implementation details, refer to the LangChain.js repository for the ScoreThresholdRetriever
and VectorStore
classes.
To continue the conversation, mention @dosu.
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
No response
Description
when i try to use
similaritySearchWithScore
in Milvus. The the score 0 is similar, 1 is most dissimilar. This not match is your document, when i useScoreThresholdRetriever
to control the soce, theminSimilarityScore
is expect score 1 is similar, 0 is most dissimilar. in this case can't useSimilarity Score Threshold
in Milvus(ZILLIZ_CLOUD)System Info
yarn version 1.22.19 node version 18.20.2
@langchain/community@0.0.55 @langchain/openai@0.0.28 @langchain/core@0.1.62 langchain@0.1.36