Similarity Score Threshold

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain.js documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain.js rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { Milvus } from '@langchain/community/vectorstores/milvus';
import { OpenAIEmbeddings } from '@langchain/openai';
(async () => {
    const embeddings = new OpenAIEmbeddings({
        apiKey: process.env.OPEN_AI_API_KEY,
    });
    const vectorStore = await Milvus.fromExistingCollection(embeddings, {
        url: process.env.ZILLIZ_CLOUD_URL,
        username: process.env.ZILLIZ_CLOUD_USERNAME,
        password: process.env.ZILLIZC_LOUD_PASSWORD,
        ssl: true,
        collectionName: 'test',
    });

      const result = await vectorStore.similaritySearchWithScore('Hello,');
      console.log(result);
})();

output:

[
  [
    Document { pageContent: 'Hello,', metadata: [Object] },
    0.000004837587312067626
  ],
  [
    Document { pageContent: 'Hi,', metadata: [Object] },
    0.03748885169625282
  ],
  [
    Document { pageContent: 'Hi.', metadata: [Object] },
    0.11468228697776794
  ],
  [
    Document { pageContent: 'Hi.', metadata: [Object] },
    0.11468228697776794
  ]
]

Error Message and Stack Trace (if applicable)

No response

Description

when i try to use similaritySearchWithScore in Milvus. The the score 0 is similar, 1 is most dissimilar. This not match is your document, when i use ScoreThresholdRetriever to control the soce, the minSimilarityScore is expect score 1 is similar, 0 is most dissimilar. in this case can't use Similarity Score Threshold in Milvus(ZILLIZ_CLOUD)

System Info

yarn version 1.22.19 node version 18.20.2

@langchain/community@0.0.55 @langchain/openai@0.0.28 @langchain/core@0.1.62 langchain@0.1.36

To address the issue with the discrepancy in similarity score interpretation between Milvus and the ScoreThresholdRetriever in LangChain.js, where Milvus interprets a score of 0 as maximum similarity and 1 as maximum dissimilarity, you can adjust the minSimilarityScore parameter in the ScoreThresholdRetriever to align with Milvus's scoring. This involves setting minSimilarityScore to a value close to 0 to include more similar results.

For a more technical solution, consider preprocessing or transforming the scores from Milvus before they are evaluated by the ScoreThresholdRetriever. This could involve modifying the similaritySearchWithScore method to invert the scores, for example, by subtracting them from 1, to align with the expected scoring system of the ScoreThresholdRetriever.

Here's a conceptual example of how you might adjust the score inversion:

const results: [Document, number][] = [];
searchResp.results.forEach((result) => {
  // Invert the similarity score to align with ScoreThresholdRetriever expectations
  const invertedScore = 1 - result.score;
  results.push([new Document(fields), invertedScore]);
});

This approach assumes scores are normalized between 0 and 1. If Milvus uses a different scoring range, you'll need to adjust the inversion logic accordingly.

Alternatively, you could extend or modify the ScoreThresholdRetriever to interpret scores from Milvus correctly, by prioritizing lower scores as indicating higher similarity. This might involve changing the filtering logic within the ScoreThresholdRetriever.

For implementation details, refer to the LangChain.js repository for the ScoreThresholdRetriever and VectorStore classes.

#### Details

**Sources** - [langchain/src/retrievers/score_threshold.ts](https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/retrievers/score_threshold.ts) - [libs/langchain-community/src/vectorstores/milvus.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/milvus.ts) - [langchain-core/src/vectorstores.ts](https://github.com/langchain-ai/langchainjs/blob/main/langchain-core/src/vectorstores.ts) - [examples/src/retrievers/similarity_score_threshold.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/retrievers/similarity_score_threshold.ts) - [examples/src/indexes/vector_stores/singlestore.ts](https://github.com/langchain-ai/langchainjs/blob/main/examples/src/indexes/vector_stores/singlestore.ts) - [libs/create-langchain-integration/template/src/vectorstores.ts](https://github.com/langchain-ai/langchainjs/blob/main/libs/create-langchain-integration/template/src/vectorstores.ts)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

langchain-ai / langchainjs