Open sunnyunde-xooa opened 3 months ago
Hey @sunnyunde-xooa! Great to see you back here. Hope you've been doing well since our last adventure in code! 😄👋
Based on the error message you're receiving, it seems that the schema of the documents you're trying to upload doesn't match the schema expected by your Azure AI Search index. The error message indicates that the Azure AI Search service is expecting a property named content
, but it was not found in the documents you attempted to upload.
Given the context of your code, it appears that you are using pageContent
as the key for the main content of your documents. However, the Azure AI Search index is expecting a property named content
instead of pageContent
.
To resolve this issue, you have a couple of options:
pageContent
to content
in the documents you are uploading. For example:await vectorStore.addDocuments(
[
{
content: "This book is about politics", // Changed from pageContent to content
metadata: {
source: "doc1",
attributes: [{ key: "a", value: "1" }],
},
},
// Additional documents...
],
{ ids: DOCUMENT_IDS }
);
pageContent
as the key, you can adjust the index schema to expect pageContent
instead of content
. This approach requires modifying the index schema in the Azure portal or through the Azure Search SDK to align with your document structure.Choose the approach that best fits your requirements and existing setup. If you opt to modify the document schema in your code, ensure that all references to pageContent
are updated to content
to maintain consistency.
I hope this helps! If you have any further questions, feel free to ask.
The issue is with hard coded values of content, metadata and embedding key as they can be dynamic as per the creators need they should not be hard coded along with that the metadata key's value are also hard coded rather they should be open as I might need to pass extra or none of the key
@sunnyunde-xooa I'm also running into this issue. I'm not sure why these key values are being treated as constants.
Maintainers, I'm happy to submit a patch that allows the creator to set these field name values. One easy, backwards compatible approach would be extending the AzureAISearchConfig
type to include additional optional fields for these types. The class could then replace the use of the constants (DEFAULT_...) with instance variables that would be set to either the value passed in as part of the config and default to the existing constants.
Would the above proposal be acceptable as an approach? If not, do you have other suggestions?
Unfortunately this initial approach doesn't work, at least as I originally envisioned it, because of Typescript issues. The underlying Azure SearchClient expects a parameterized type, and then uses that type's keys to determine the set of allowed fields in the underlying schema. This makes the whole thing very inflexible - the fields need to be specified at build time, not run time. Which means, for example, passing in a configuration that specifies the fields in an environment variable isn't possible.
This is, to put it mildly, a little inconvenient. The idea of passing in a configuration for a search store seems pretty basic. I'm going to spend some time poking around and looking at alternative approaches.
@petergoldstein for now I had solved this issue by creating a custom class which inherits the AzureAISearchVectorStore
and overides the method hybridSearchVectorWithScore
and I am keeping the content key, content vector key and metadata key dynamic by passing those value while intializing the class and later using in the function.
Checked other resources
Example Code
// Create Azure AI Search instance ` const vectorStore = new AzureAISearchVectorStore(this.embedding, config);
const ids = vectorSotre.addDocuments(docs);
Error Message and Stack Trace (if applicable)
Error: Azure AI Search uploadDocuments batch failed: RestError: The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type. at EventEmitter. (/home/xooa/Xooa/thaleslabs-ai/packages/backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:225:19)
at EventEmitter.emit (node:events:513:28)
at EventEmitter.emit (node:domain:489:12)
at SearchIndexingBufferedSender.submitDocuments (/home/xooa/Xooa/thaleslabs-ai/node_modules/@azure/search-documents/dist/index.js:4371:30)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async SearchIndexingBufferedSender.internalFlush (/home/xooa/Xooa/thaleslabs-ai/node_modules/@azure/search-documents/dist/index.js:4318:17)
at async AzureAISearchVectorStore.addVectors (/home/xooa/Xooa/thaleslabs-ai/packages/backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:228:9)
at async AzureAISearchVectorStore.addDocuments (/home/xooa/Xooa/thaleslabs-ai/packages/backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:201:25)
Description
const entities: AzureAISearchDocument[] = documents.map((doc, idx) => ({ id: ids[idx], content: doc.pageContent, content_vector: vectors[idx], metadata: { source: doc.metadata?.source, attributes: doc.metadata?.attributes ?? [], }, }));
System Info
yarn info v1.22.19