langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.75k stars 2.2k forks source link

OpenSearchVectorStore similaritySearch Metadata filtering not working for exact match #6867

Open stevedaskam opened 1 month ago

stevedaskam commented 1 month ago

Checked other resources

Example Code


const docs = [
      new Document({
        metadata: { user: "U12345-1234" },
        pageContent: "Amazon Simple Storage Service (Amazon S3) is a cloud storage service that allows users to store and retrieve data of any amount from anywhere.",
      }),
      new Document({
        metadata: { user: "U12345-5678" },
        pageContent: "Amazon Elastic Compute Cloud (Amazon EC2) provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.",
      }),
    ];

const vectorStore = await OpenSearchVectorStore.fromDocuments(docs, embeddings, {
        client: client,
        indexName: 'my-index',
        service: 'aoss'
      });

const query = "What is S3?";
const k = 5; // Number of results to retrieve
const filter = {
  user: "U12345-1234"
};

// Returns no documents
const results = await vectorStore.similaritySearch(query, k, filter);

### Error Message and Stack Trace (if applicable)

_No response_

### Description

OpenSearchVectorStore similarity search metadata filtering not returning any documents for an exact match of the specified user field in the document metadata.

### System Info

AWS Lambda running Node.js 20.x
dosubot[bot] commented 1 month ago

I found a similar unsolved discussion and a related closed issue that might help you:

  1. How to filter metadata with OpenSearch similaritySearch [1].
  2. ElasticVectorSearch: exclude metadata filters not working due to syntax error in filter creation, which was closed by a pull request: community[patch]: fix: ElasticVectorSearch: exclude metadata filters … [2].

The issue might be due to the way the metadata filtering is being applied in the similarity search. Ensure that you are using the correct field name and match criteria in your searchOptions. Here is an example:

const exactValueResult = await store.similaritySearch(query, 4, {
  fields: ["metadata.user"],
  searchOptions: {
    query: { field: "metadata.user", match: "U12345-1234" },
  },
});

Additionally, verify that the metadata is correctly indexed and stored in the vector store. If the metadata is not properly indexed, the search might not be able to filter based on the specified criteria.

Make sure your OpenSearch instance is correctly set up and running. You can use the official Docker image to get started with OpenSearch. Here is an example of how to set up and query documents in OpenSearch:

import { Client } from "@opensearch-project/opensearch";
import { Document } from "langchain/document";
import { OpenAIEmbeddings } from "@langchain/openai";
import { OpenSearchVectorStore } from "langchain/vectorstores/opensearch";

const client = new Client({
  nodes: [process.env.OPENSEARCH_URL ?? "http://127.0.0.1:9200"],
});
const docs = [
  new Document({
    metadata: { user: "U12345-1234" },
    pageContent: "Amazon Simple Storage Service (Amazon S3) is a cloud storage service that allows users to store and retrieve data of any amount from anywhere.",
  }),
  new Document({
    metadata: { user: "U12345-5678" },
    pageContent: "Amazon Elastic Compute Cloud (Amazon EC2) provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.",
  }),
];

await OpenSearchVectorStore.fromDocuments(docs, new OpenAIEmbeddings(), {
  client,
  indexName: 'my-index',
});

To query the documents:

const results = await vectorStore.similaritySearch(query, k, {
  fields: ["metadata.user"],
  searchOptions: {
    query: { field: "metadata.user", match: "U12345-1234" },
  },
});
console.log(JSON.stringify(results, null, 2));

Make sure your OpenSearch instance is running and properly configured to handle the metadata filtering [3][4][5][6].

To continue talking to Dosu, mention @dosu.

stevedaskam commented 1 month ago

@dosu When i run this, i still don't get any documents back

dosubot[bot] commented 1 month ago

The issue might be due to the structure or content of the OpenSearch index. The similaritySearch method with metadata filtering requires a well-structured OpenSearch index where documents are enriched with both text and metadata properties. If the index is not properly structured or the documents do not contain the expected metadata properties, the search may not return any documents.

Here is a sample code snippet that demonstrates how to use the similaritySearch method with metadata filtering:

import { Client } from "@opensearch-project/opensearch";
import { OpenAIEmbeddings } from "@langchain/openai";
import { OpenSearchVectorStore } from "@langchain/community/vectorstores/opensearch";

const client = new Client({
  nodes: ["http://localhost:9200"], // URL for the OpenSearch instance
});

const indexName = "test_index";

const embeddings = new OpenAIEmbeddings();
const store = new OpenSearchVectorStore(embeddings, { client, indexName });
await store.deleteIfExists();

await store.addDocuments([
  { pageContent: "hello", metadata: { a: 2 } },
  { pageContent: "car", metadata: { a: 1 } },
  { pageContent: "adjective", metadata: { a: 1 } },
  { pageContent: "hi", metadata: { a: 1 } },
]);

const filter = { a: 1 };
const results = await store.similaritySearch("hello!", 1, { filter });

console.log(results);

/*
  [ Document { pageContent: 'car', metadata: { a: 1 } } ]
*/

await store.close();

Ensure that:

  1. The OpenSearch index is properly structured.
  2. Documents contain the expected metadata properties.
  3. The filter criteria match the metadata properties in the index.

If these conditions are met and the issue persists, further investigation into the index structure and content may be necessary [1][2].

To continue talking to Dosu, mention @dosu.

stevedaskam commented 1 month ago

@dosu The above code does not work as "term" queries no longer seem to return results from OpenSearch for metadata fields. Is there any way to bypass this and pass in my own filter query?