langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.6k stars 15.31k forks source link

AzureSearch Bug -- langchain.vectorstores.azuresearch #15039

Closed RERobbins closed 5 months ago

RERobbins commented 10 months ago

System Info

The example found here and in particular this code fragment

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(deployment=model, chunk_size=1)
index_name: str = "langchain-vector-demo"
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=vector_store_address,
    azure_search_key=vector_store_password,
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

fails with a message that at the top is:

vector_search_configuration is not a known attribute of class <class 'azure.search.documents.indexes.models._index.SearchField'> and will be ignored
semantic_settings is not a known attribute of class <class 'azure.search.documents.indexes.models._index.SearchIndex'> and will be ignored

and culminates with:

HttpResponseError: (InvalidRequestParameter) The request is invalid. Details: definition : The vector field 'content_vector' must have the property 'vectorSearchConfiguration' set.
Code: InvalidRequestParameter
Message: The request is invalid. Details: definition : The vector field 'content_vector' must have the property 'vectorSearchConfiguration' set.
Exception Details:  (InvalidField) The vector field 'content_vector' must have the property 'vectorSearchConfiguration' set. Parameters: definition
    Code: InvalidField
    Message: The vector field 'content_vector' must have the property 'vectorSearchConfiguration' set. Parameters: definition

I am running with python 3.10, openai 1.5.0, langchain 0.0.351 and azure-search-documents 11.4.0. If I revert to azure-search-documents 11.4.0b8 the code works.

This appears to be related to the November 2023 Microsoft API change which introduced the concept of a "profile" which aggregates various vector search settings under one name. As a result of this API change, the old "vector_search_configuration" was deprecated and a new "vector_search_profile" was added, along with a new "profiles" object. It seems that the Langchain extension has not been updated for this change and expects the old "vector_search_configuration" property which doesn't exist on newer SDK releases.

See the discussion here.

Who can help?

@hwchase17 @bas

Information

Related Components

Reproduction

The following sample code fragment from [here:(https://python.langchain.com/docs/integrations/vectorstores/azuresearch) fails.

import os

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.azuresearch import AzureSearch

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "YOUR_OPENAI_ENDPOINT"
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
model: str = "text-embedding-ada-002"

vector_store_address: str = "YOUR_AZURE_SEARCH_ENDPOINT"
vector_store_password: str = "YOUR_AZURE_SEARCH_ADMIN_KEY"

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(deployment=model, chunk_size=1)
index_name: str = "langchain-vector-demo"
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=vector_store_address,
    azure_search_key=vector_store_password,
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

Expected behavior

The code should run without failure and return a vector store.

dosubot[bot] commented 10 months ago

🤖

Hey there @RERobbins! Nice to see you back here on the LangChain repo. Hope you're doing well. Let's dive into this issue you're facing.

Thank you for the detailed report, RERobbins. It seems like you've encountered an issue due to the recent changes in the Azure Search API. As you've correctly pointed out, the vector_search_configuration attribute has been deprecated and replaced with vector_search_profile in the newer versions of the Azure SDK.

The LangChain codebase has been updated to handle these changes. Specifically, the _get_search_client function in the AzureSearch class now checks the Azure SDK version and creates the VectorSearch object accordingly. Here's the relevant code snippet:

# Vector search configuration
if vector_search is None:
    if NEW_VERSION:
        # >= 11.4.0:
        #   VectorSearch(algorithm_configuration) --> VectorSearch(algorithms)
        # HnswVectorSearchAlgorithmConfiguration --> HnswAlgorithmConfiguration
        vector_search = VectorSearch(
            algorithms=[
                HnswAlgorithmConfiguration(
                    name="default",
                    kind="hnsw",
                    parameters={  # type: ignore
                        "m": 4,
                        "efConstruction": 400,
                        "efSearch": 500,
                        "metric": "cosine",
                    },
                )
            ]
        )
    else:  # < 11.4.0
        vector_search = VectorSearch(
            algorithm_configurations=[
                HnswVectorSearchAlgorithmConfiguration(
                    name="default",
                    kind="hnsw",
                    parameters={  # type: ignore
                        "m": 4,
                        "efConstruction": 400,
                        "efSearch": 500,
                        "metric": "cosine",
                    },
                )
            ]
        )

You can find this code in the AzureSearch implementation in the LangChain codebase.

However, you've mentioned that you're using LangChain version 0.0.351. The changes to handle the new Azure API were introduced in LangChain version 0.0.352. Therefore, I recommend upgrading to the latest version of LangChain to resolve this issue. You can do this by running the following command:

pip install --upgrade langchain

After upgrading, your code should work as expected with the newer versions of the Azure SDK. If you still encounter any issues after upgrading, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

konradbjk commented 9 months ago

I am at langchain 0.1.5 and I have the same issue

I believe it is related to langchain_community not langchain itself..

RERobbins commented 9 months ago

I believe it to be a Microsoft issue. Sent from my iPhoneOn Feb 7, 2024, at 4:16 PM, Konrad Bujak @.***> wrote: I am at langchain 0.1.5 and I have the same issue I believe it is related to langchain_community not langchain itself..

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

konradbjk commented 9 months ago

It is a feature, as Microsoft updated their way of connecting to the Indexes. They have now Vector Profiles (as their error states).

It is langchain's package fault. Check the linked Issues. They are working on it

shivam-51 commented 8 months ago

still facing this issue even after using azure-search-documents==11.4.0b8