langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.26k stars 14.73k forks source link

Azure Cognitive Search Vector Store doesn't apply search_kwargs when performing queries #6131

Closed CameronVetter closed 1 year ago

CameronVetter commented 1 year ago

System Info

Langchain 0.0.199 Python 3.10.11 Windows 11 (but will occur on any platform.

Who can help?

@hwchase17 @ruoccofabrizio

Information

Related Components

Reproduction

To reproduce this issue create an AzureSearch Vector Store and a RetrievalQA with a search_kwargs, like in this sample code:

import os

  cognitive_search_name = os.environ["AZURE_SEARCH_SERVICE_NAME"]
  vector_store_address: str = f"https://{cognitive_search_name}.search.windows.net/"
  index_name: str = os.environ["AZURE_SEARCH_SERVICE_INDEX_NAME"]
  vector_store_password: str = os.environ["AZURE_SEARCH_SERVICE_ADMIN_KEY"]

  from langchain.vectorstores.azuresearch import AzureSearch

  embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1, client=any)
  vector_store = AzureSearch(azure_search_endpoint=vector_store_address,  
                              azure_search_key=vector_store_password,  
                              index_name=index_name,  
                              embedding_function=embeddings.embed_query)  

    from langchain.chains import RetrievalQA

    llm = AzureChatOpenAI(deployment_name="gpt35", model_name="gpt-3.5-turbo-0301", openai_api_version="2023-03-15-preview", temperature=temperature, client=None)
    index = get_vector_store()
    retriever = index.as_retriever()
    retriever.search_kwargs = {'filters': "metadata eq 'something'"}

    qa = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=retriever,
            )

    return qa

When you execute this code using qa the search_kwargs appear in the method similarity_search in azuresearch.py but are never passed to the methods vector_search, hybrid_search, and semantic_hybrid where they actually would be used.

Expected behavior

In my example they should apply a filter to the azure cognitive search index before doing the vector search, but this is not happening because filters will always be empty when it gets to the functions where they are used. (vector_search, hybrid_search, and semantic_hybrid)

victorlee0505 commented 1 year ago

is your index.as_retriever() instance of VectorStoreRetriever or AzureSearchVectorStoreRetriever?

when i do

vector_store= AzureSearch(...)
retriever = vector_store.as_retriever(...)

print(f'retriever search_type: {retriever.search_type}')

it is always similarity unless

retriever2 = AzureSearchVectorStoreRetriever(vectorstore=vector_store)
CameronVetter commented 1 year ago

is your index.as_retriever() instance of VectorStoreRetriever or AzureSearchVectorStoreRetriever?

when i do

vector_store= AzureSearch(...)
retriever = vector_store.as_retriever(...)

print(f'retriever search_type: {retriever.search_type}')

it is always similarity unless

retriever2 = AzureSearchVectorStoreRetriever(vectorstore=vector_store)

You are right my example code is a bad cut and paste. It has to be AzureSearchVectorStoreRetriever to reproduce this. I will add a new example I just tested.

CameronVetter commented 1 year ago
import os

cognitive_search_name = os.environ["AZURE_SEARCH_SERVICE_NAME"]
vector_store_address: str = f"https://{cognitive_search_name}.search.windows.net/"
index_name: str = os.environ["AZURE_SEARCH_SERVICE_INDEX_NAME"]
vector_store_password: str = os.environ["AZURE_SEARCH_SERVICE_ADMIN_KEY"]

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.azuresearch import AzureSearch

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1, client=any)
vector_store = AzureSearch(azure_search_endpoint=vector_store_address,  
                            azure_search_key=vector_store_password,  
                            index_name=index_name,  
                            embedding_function=embeddings.embed_query)  

from langchain.chains import RetrievalQA
from langchain.chat_models import AzureChatOpenAI

llm = AzureChatOpenAI(deployment_name="gpt35", model_name="gpt-3.5-turbo-0301", openai_api_version="2023-03-15-preview", temperature=1.0, client=None)
retriever = vector_store.as_retriever()
retriever.search_kwargs = {'filters': "metadata eq 'something'"}

qa = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=retriever,
        )

qa.run("What is the name?")
victorlee0505 commented 1 year ago

Sorry for hijacking. I am on 0.0.202, calling as_retriever() did not return AzureSearchVectorStoreRetriever for me 😥

despite that, AzureSearchVectorStoreRetriever.get_relevant_documents calling back to

elif self.search_type == "hybrid":
            docs = self.vectorstore.hybrid_search(query, k=self.k)

it is missing hybrid_search(self, query: str, k: int = 4, **kwargs: Any) where filters=kwargs.get("filters", None)

CameronVetter commented 1 year ago

AzureSearchVectorStoreRetriever

Yes, that is a different path to the same issue.