langchain-ai / langchainjs

πŸ¦œπŸ”— Build context-aware reasoning applications πŸ¦œπŸ”—
https://js.langchain.com/docs/
MIT License
12.4k stars 2.1k forks source link

Similarity Score Threshold #5340

Closed yixiangfeng closed 1 month ago

yixiangfeng commented 4 months ago

Checked other resources

Example Code

import { Milvus } from '@langchain/community/vectorstores/milvus';
import { OpenAIEmbeddings } from '@langchain/openai';
(async () => {
    const embeddings = new OpenAIEmbeddings({
        apiKey: process.env.OPEN_AI_API_KEY,
    });
    const vectorStore = await Milvus.fromExistingCollection(embeddings, {
        url: process.env.ZILLIZ_CLOUD_URL,
        username: process.env.ZILLIZ_CLOUD_USERNAME,
        password: process.env.ZILLIZC_LOUD_PASSWORD,
        ssl: true,
        collectionName: 'test',
    });

      const result = await vectorStore.similaritySearchWithScore('Hello,');
      console.log(result);
})();
output:

[
  [
    Document { pageContent: 'Hello,', metadata: [Object] },
    0.000004837587312067626
  ],
  [
    Document { pageContent: 'Hi,', metadata: [Object] },
    0.03748885169625282
  ],
  [
    Document { pageContent: 'Hi.', metadata: [Object] },
    0.11468228697776794
  ],
  [
    Document { pageContent: 'Hi.', metadata: [Object] },
    0.11468228697776794
  ]
]

Error Message and Stack Trace (if applicable)

No response

Description

when i try to use similaritySearchWithScore in Milvus. The the score 0 is similar, 1 is most dissimilar. This not match is your document, when i use ScoreThresholdRetriever to control the soce, the minSimilarityScore is expect score 1 is similar, 0 is most dissimilar. in this case can't use Similarity Score Threshold in Milvus(ZILLIZ_CLOUD)

System Info

yarn version 1.22.19 node version 18.20.2

@langchain/community@0.0.55 @langchain/openai@0.0.28 @langchain/core@0.1.62 langchain@0.1.36

dosubot[bot] commented 4 months ago

To address the issue with the discrepancy in similarity score interpretation between Milvus and the ScoreThresholdRetriever in LangChain.js, where Milvus interprets a score of 0 as maximum similarity and 1 as maximum dissimilarity, you can adjust the minSimilarityScore parameter in the ScoreThresholdRetriever to align with Milvus's scoring. This involves setting minSimilarityScore to a value close to 0 to include more similar results.

For a more technical solution, consider preprocessing or transforming the scores from Milvus before they are evaluated by the ScoreThresholdRetriever. This could involve modifying the similaritySearchWithScore method to invert the scores, for example, by subtracting them from 1, to align with the expected scoring system of the ScoreThresholdRetriever.

Here's a conceptual example of how you might adjust the score inversion:

const results: [Document, number][] = [];
searchResp.results.forEach((result) => {
  // Invert the similarity score to align with ScoreThresholdRetriever expectations
  const invertedScore = 1 - result.score;
  results.push([new Document(fields), invertedScore]);
});

This approach assumes scores are normalized between 0 and 1. If Milvus uses a different scoring range, you'll need to adjust the inversion logic accordingly.

Alternatively, you could extend or modify the ScoreThresholdRetriever to interpret scores from Milvus correctly, by prioritizing lower scores as indicating higher similarity. This might involve changing the filtering logic within the ScoreThresholdRetriever.

For implementation details, refer to the LangChain.js repository for the ScoreThresholdRetriever and VectorStore classes.

#### Details **Sources** - [langchain/src/retrievers/score_threshold.ts](https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/retrievers/score_threshold.ts) - [libs/langchain-community/src/vectorstores/milvus.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/milvus.ts) - [langchain-core/src/vectorstores.ts](https://github.com/langchain-ai/langchainjs/blob/main/langchain-core/src/vectorstores.ts) - [examples/src/retrievers/similarity_score_threshold.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/retrievers/similarity_score_threshold.ts) - [examples/src/indexes/vector_stores/singlestore.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/indexes/vector_stores/singlestore.ts) - [libs/create-langchain-integration/template/src/vectorstores.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/create-langchain-integration/template/src/vectorstores.ts)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

To continue the conversation, mention @dosu.