Open dionoid opened 6 months ago
To address the issue of customizing default field names, metadata, and index name in LangChain4j when using an existing Azure AI Search index, it seems like the current implementation doesn't provide direct support for this level of customization. Your suggestion to enhance the AzureAiSearchContentRetriever.builder
to allow configuration of index field names and metadata mapping is valid and would indeed improve flexibility and usability.
For the DEFAULT_FIELD_METADATA_SOURCE
field not being used, moving key-value pairs from metadata->attributes
to just metadata
and treating "source" as a metadata key could simplify the structure and make it more intuitive. This approach would require adjustments in the codebase to ensure compatibility and maintain the integrity of data retrieval and indexing processes.
Regarding the SearchIndex
parameter in the AbstractAzureAiSearchEmbeddingStore
's initialize
method being used only for creating a new index and not for retrieving, this seems to be by design. However, revisiting this design could offer more flexibility in managing indexes, especially for use cases that involve connecting to and interacting with existing indexes.
Collaborating with the project maintainers, like reaching out to @jdubois as you mentioned, would be the best course of action to discuss these enhancements and potentially contribute to the project. Your willingness to contribute and your insights could lead to significant improvements in LangChain4j's Azure AI Search integration.
To continue the conversation, mention @dosu.
Thanks @dionoid ! This seems linked to #1062 so let me finish that one first. I didn't know that Azure AI Search "Import and vectorize data" feature, this should be supported and documented! There are lots of room for improvement here, so happy to work with you on this, or help you contribute!
@langchain4j can you assign this issue to me?
@yoshioterada from my team at Microsoft might have the time to work on this during the summer, so I'm pinging him here.
Is there any progress here? I saw the pull request was closed on July 5th
No sorry, I really didn't have the time lately, and I'm not sure when I will be able to work on this
Describe the bug I'm using LangChain4j with an existing Azure AI Search index, which was created using the "Import and vectorize data" feature of Azure AI Search. When connecting this index using the AzureAiSearchContentRetriever, I found that the underlying AbstractAzureAiSearchEmbeddingStore doesn't allow me to override the default field names, metadata or index name, so I was blocked. Also, metadata mapping in the AzureAiSearchContentRetriever seems to be limited to pure Vector queries and not implemented for FullText, Hybrid or HybridWithReranking.
Log and Stack trace N.A.
To Reproduce Import and vectorize documents into a new Azure AI Search index using the "Import and vectorize data", or use the "Add your data" feature in the playground of Azure OpenAI Studio. Then there is no way to connect these indexes to the AzureAiSearchContentRetriever and use them in LangChain4j.
Expected behavior
Please complete the following information:
Additional context I would be happy to contribute to this project. Reaching out to @jdubois to learn what he thinks the best way would be to solve this issue, and maybe we can work on this together? Also I have some additional questions: