langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.24k stars 2.07k forks source link

OpenSearchVectorStore can't insert data to knn_vector #6132

Open duduxiao opened 1 month ago

duduxiao commented 1 month ago

Checked other resources

Example Code

I use opensearch vector data; this is my script:

import { Client } from "@opensearch-project/opensearch";
import { Document } from "langchain/document";
import { OpenAIEmbeddings } from "@langchain/openai";
import { OpenSearchVectorStore } from "@langchain/community/vectorstores/opensearch";

const client = new Client({
    nodes: [process.env.OPENSEARCH_URL ?? "http://10.100.10.30:9200"],
});

const obj = {
    azureOpenAIApiKey: 'sk-xxxx',
    azureOpenAIBasePath: 'http://xxxx',
    azureOpenAIApiVersion: "1",
    azureOpenAIApiCompletionsDeploymentName: "1",
    azureOpenAIApiEmbeddingsDeploymentName: "1",
    azureOpenAIApiInstanceName: "1",
}
// const vectorFieldName = 'vector_field'
const vectorFieldName = 'vector_field1'
const textFieldName = 'text'

const docs = [
    new Document({
        metadata: { foo: "this is text text" },
        pageContent: "NO BUG",
    })
];

try {
    const a = await OpenSearchVectorStore.fromDocuments(docs, 
        new OpenAIEmbeddings(obj), {
        client,
        vectorFieldName,
        textFieldName,
        indexName: 'ratu_test_bce_vectors_123456', // Will default to `documents`
    });

    if (a) {
        console.log('Vector store created successfully.');
    } else {
        console.log('Failed to create vector store.');
    }

} catch (error) {
    console.error('Error creating vector store:', error);
}

Error Message and Stack Trace (if applicable)

Above script no have error, so i debug opensearch.js then bulk return error:

caused_by: {
    type: 'json_parse_exception',
    reason: 'Current token (START_ARRAY) not numeric, can not use numeric value accessors\n' +
      ' at [Source: (byte[])"{"vector_field1":[[0.02122347615659237,0.030510900542140007,0.0017713435227051377,-0.0021322225220501423,-0.02155894786119461,0.039561688899993896,-0.000029259719667606987,0.015354755334556103,0.021116631105542183,0.00888604111969471,-0.09831065684556961,0.024924634024500847,0.13480432331562042,0.0015266432892531157,-0.024981919676065445,-0.014099709689617157,0.01099364086985588,0.005934333428740501,-0.0005805338150821626,0.05886352062225342,-0.008693357929587364,0.019988033920526505,0.035462103"[truncated 15864 bytes]; line: 1, column: 20]'
  }

Description

I find OpenSearchVectorStore.addVectors have a bug:

file opensearch.js


async addVectors(vectors, documents, options) {
        await this.ensureIndexExists(vectors[0].length, this.engine, this.spaceType, this.efSearch, this.efConstruction, this.numberOfShards, this.numberOfReplicas, this.m);
        const documentIds = options?.ids ?? Array.from({ length: vectors.length }, () => uuid.v4());
        const operations = vectors.flatMap((embedding, idx) => {
            // eslint-disable-next-line @typescript-eslint/no-explicit-any
            const document = [
                {
                    index: {
                        _index: this.indexName,
                        _id: documentIds[idx],
                    },
                },
                {
                    [this.vectorFieldName]: embedding,              #this is a list[float] ,not list[list[float]]
                    [this.textFieldName]: documents[idx].pageContent,
                    [this.metadataFieldName]: documents[idx].metadata,
                },
            ];
            // aoss does not support document id
            if (this.isAoss) {
                delete document[0].index?._id;
            }
            return document;
        });
        const m=await this.client.bulk({ body: operations });
        // aoss does not support refresh
        if (!this.isAoss) {
            await this.client.indices.refresh({ index: this.indexName });
        }
    }

embedding modiify embedding[0] , It works!!

System Info

package.json

 "dependencies": {
    "@langchain/community": "^0.0.57",
    "@opensearch-project/opensearch": "^1.2.0",
    "langchain": "^0.1.37"
  }

NodeJS version:

v18.15.0
dosubot[bot] commented 1 month ago

Hey @duduxiao! I'm here to help you out with any bugs, questions, or contributions you might have. Let's work together to solve this issue.

To resolve the json_parse_exception error when inserting data into the knn_vector in the OpenSearchVectorStore class, ensure that the data being inserted is correctly formatted as JSON. Specifically, check the structure of the operations array in the addVectors method. Here is the relevant code snippet:

const operations = vectors.flatMap((embedding, idx) => {
  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  const document: Record<string, any> = [
    {
      index: {
        _index: this.indexName,
        _id: documentIds[idx],
      },
    },
    {
      [this.vectorFieldName]: embedding,
      [this.textFieldName]: documents[idx].pageContent,
      [this.metadataFieldName]: documents[idx].metadata,
    },
  ];

  // aoss does not support document id
  if (this.isAoss) {
    delete document[0].index?._id;
  }
  return document;
});
await this.client.bulk({ body: operations });

Ensure that the embedding, pageContent, and metadata fields are correctly formatted and do not contain any invalid JSON structures. Additionally, verify that the knn_vector field is correctly defined in the index mappings.

If the issue persists, you may need to add more detailed logging to identify the exact part of the data causing the json_parse_exception [1].

To continue talking to Dosu, mention @dosu.