langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.33k stars 2.09k forks source link

OpenSearchVectorStore Not Adding Documents to OpenSearch Serverless Vector Database With AzureOpenAIEmbeddings #6855

Open stevedaskam opened 3 hours ago

stevedaskam commented 3 hours ago

Checked other resources

Example Code

import { AzureOpenAIEmbeddings } from "@langchain/openai";
import { OpenSearchVectorStore } from "@langchain/community/vectorstores/opensearch";
import { Document } from "langchain/document";

const embeddings = new AzureOpenAIEmbeddings({
    azureOpenAIApiKey: "aaa", // In Node.js defaults to process.env.AZURE_OPENAI_API_KEY
    azureOpenAIApiInstanceName: "bbb", // In Node.js defaults to process.env.AZURE_OPENAI_API_INSTANCE_NAME
    azureOpenAIApiEmbeddingsDeploymentName: "text-embedding-ada-002", // In Node.js defaults to process.env.AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME
    azureOpenAIApiVersion: "2024-06-01", // In Node.js defaults to process.env.AZURE_OPENAI_API_VERSION
    maxRetries: 10,
  });

const docText = "Some long text string...";
const docs = [
    new Document({ pageContent: docText })
];

await OpenSearchVectorStore.fromDocuments(docs, embeddings, {
  client,
  indexName: "my-index", // Will default to `documents`
  service: 'aoss'
});

Error Message and Stack Trace (if applicable)

I am not seeing any error message

Description

When running the above code, documents are not saved to the Amazon OpenSearch Serverless Vector database when using AzureOpenAIEmbeddings. If I use BedrockEmbeddings, everything saves fine to the database.

System Info

Node.js 20.x running on AWS Lambda

dosubot[bot] commented 3 hours ago

The issue arises because the document IDs are being deleted when the isAoss flag is set to true in the addVectors method of the OpenSearchVectorStore class. This deletion prevents the documents from being saved correctly to the Amazon OpenSearch Serverless Vector database.

Here's the relevant part of the code:

if (this.isAoss) {
  delete document[0].index?._id;
}

When isAoss is true, the document IDs are removed from the bulk operations, which is likely causing the documents not to be saved correctly [1].

To continue talking to Dosu, mention @dosu.

stevedaskam commented 3 hours ago

Is there an alternative method to save the documents to the database using OpenSearchVectorStore that works?

dosubot[bot] commented 3 hours ago

Yes, there is an alternative method to save the documents to the database using OpenSearchVectorStore that works. You can use the addVectors method directly to add documents and their corresponding vectors to the store. Here is an example:

import { Client } from "@opensearch-project/opensearch";
import { OpenAIEmbeddings } from "@langchain/openai";
import { OpenSearchVectorStore } from "@langchain/community/vectorstores/opensearch";
import * as uuid from "uuid";
import { Document } from "@langchain/core/documents";

export async function run() {
  const client = new Client({
    nodes: [process.env.OPENSEARCH_URL ?? "http://127.0.0.1:9200"],
  });

  const embeddings = new OpenAIEmbeddings();

  const vectorStore2 = new OpenSearchVectorStore(embeddings, {
    client,
    indexName: "custom",
  });

  const documents = [
    new Document({
      pageContent: "Do I dare to eat an apple?",
      metadata: {
        foo: "baz",
      },
    }),
    new Document({
      pageContent: "There is no better place than the hotel lobby",
      metadata: {
        foo: "bar",
      },
    }),
    new Document({
      pageContent: "OpenSearch is a powerful vector db",
      metadata: {
        foo: "bat",
      },
    }),
  ];
  const vectors = Array.from({ length: documents.length }, (_, i) => [
    i,
    i + 1,
    i + 2,
  ]);
  const ids = Array.from({ length: documents.length }, () => uuid.v4());
  await vectorStore2.addVectors(vectors, documents, { ids });

  const resultTwo = await vectorStore2.similaritySearchVectorWithScore(
    vectors[0],
    3
  );
  console.log(resultTwo);
}

This method involves creating a new OpenSearchVectorStore instance and using the addVectors method to add documents and their corresponding vectors to the store. Additionally, the OpenSearchVectorStore class provides methods like addDocuments and fromDocuments to facilitate adding documents to the OpenSearch index, ensuring the index exists, and performing similarity searches [1][2].

To continue talking to Dosu, mention @dosu.