langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
11.89k stars 1.99k forks source link

AzureAISearch hardcoded content, metadata & embedding key #4986

Open sunnyunde-xooa opened 3 months ago

sunnyunde-xooa commented 3 months ago

Checked other resources

Example Code

// Create Azure AI Search instance ` const vectorStore = new AzureAISearchVectorStore(this.embedding, config);

const docs = []; /// create documents using document loader

const ids = vectorSotre.addDocuments(docs);

`

Error Message and Stack Trace (if applicable)

Error: Azure AI Search uploadDocuments batch failed: RestError: The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type. at EventEmitter. (/home/xooa/Xooa/thaleslabs-ai/packages/backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:225:19) at EventEmitter.emit (node:events:513:28) at EventEmitter.emit (node:domain:489:12) at SearchIndexingBufferedSender.submitDocuments (/home/xooa/Xooa/thaleslabs-ai/node_modules/@azure/search-documents/dist/index.js:4371:30) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async SearchIndexingBufferedSender.internalFlush (/home/xooa/Xooa/thaleslabs-ai/node_modules/@azure/search-documents/dist/index.js:4318:17) at async AzureAISearchVectorStore.addVectors (/home/xooa/Xooa/thaleslabs-ai/packages/backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:228:9) at async AzureAISearchVectorStore.addDocuments (/home/xooa/Xooa/thaleslabs-ai/packages/backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:201:25)

Description

const entities: AzureAISearchDocument[] = documents.map((doc, idx) => ({ id: ids[idx], content: doc.pageContent, content_vector: vectors[idx], metadata: { source: doc.metadata?.source, attributes: doc.metadata?.attributes ?? [], }, }));

Here is the code which creates the document before uploading the doucment in the Azure AI Search
As the content, metadata and content_vector keys are hardcoded any instance created previously with other keys are not possible to use this code.

System Info

yarn info v1.22.19

dosubot[bot] commented 3 months ago

Hey @sunnyunde-xooa! Great to see you back here. Hope you've been doing well since our last adventure in code! 😄👋

Based on the error message you're receiving, it seems that the schema of the documents you're trying to upload doesn't match the schema expected by your Azure AI Search index. The error message indicates that the Azure AI Search service is expecting a property named content, but it was not found in the documents you attempted to upload.

Given the context of your code, it appears that you are using pageContent as the key for the main content of your documents. However, the Azure AI Search index is expecting a property named content instead of pageContent.

To resolve this issue, you have a couple of options:

  1. Modify the Document Schema in Your Code: Update your document schema to match the expected schema of the Azure AI Search index. This means changing pageContent to content in the documents you are uploading. For example:
await vectorStore.addDocuments(
  [
    {
      content: "This book is about politics", // Changed from pageContent to content
      metadata: {
        source: "doc1",
        attributes: [{ key: "a", value: "1" }],
      },
    },
    // Additional documents...
  ],
  { ids: DOCUMENT_IDS }
);
  1. Adjust the Azure AI Search Index Schema: If you have control over the Azure AI Search index schema and prefer to use pageContent as the key, you can adjust the index schema to expect pageContent instead of content. This approach requires modifying the index schema in the Azure portal or through the Azure Search SDK to align with your document structure.

Choose the approach that best fits your requirements and existing setup. If you opt to modify the document schema in your code, ensure that all references to pageContent are updated to content to maintain consistency.

I hope this helps! If you have any further questions, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
sunnyunde-xooa commented 3 months ago

The issue is with hard coded values of content, metadata and embedding key as they can be dynamic as per the creators need they should not be hard coded along with that the metadata key's value are also hard coded rather they should be open as I might need to pass extra or none of the key

petergoldstein commented 3 months ago

@sunnyunde-xooa I'm also running into this issue. I'm not sure why these key values are being treated as constants.

Maintainers, I'm happy to submit a patch that allows the creator to set these field name values. One easy, backwards compatible approach would be extending the AzureAISearchConfig type to include additional optional fields for these types. The class could then replace the use of the constants (DEFAULT_...) with instance variables that would be set to either the value passed in as part of the config and default to the existing constants.

Would the above proposal be acceptable as an approach? If not, do you have other suggestions?

petergoldstein commented 3 months ago

Unfortunately this initial approach doesn't work, at least as I originally envisioned it, because of Typescript issues. The underlying Azure SearchClient expects a parameterized type, and then uses that type's keys to determine the set of allowed fields in the underlying schema. This makes the whole thing very inflexible - the fields need to be specified at build time, not run time. Which means, for example, passing in a configuration that specifies the fields in an environment variable isn't possible.

This is, to put it mildly, a little inconvenient. The idea of passing in a configuration for a search store seems pretty basic. I'm going to spend some time poking around and looking at alternative approaches.

sunnyunde-xooa commented 2 months ago

@petergoldstein for now I had solved this issue by creating a custom class which inherits the AzureAISearchVectorStore and overides the method hybridSearchVectorWithScore and I am keeping the content key, content vector key and metadata key dynamic by passing those value while intializing the class and later using in the function.