Closed DSgUY closed 1 year ago
Answer generated by a π€
From your description, it appears that you're encountering an InvalidVectorQuery
error when trying to perform a similarity search using the AzureSearch vector store in the LangChain framework. The error message suggests that the 'value' property of the vector query is either null or an empty array, which is causing the issue.
To address this, let's first try to identify where the problem is originating from. The similarity_search
method in LangChain's AzureSearch class uses the embed_query
function to convert your query text into a vector. If this function isn't returning a valid vector, it could be the root cause of the error.
Here's a way to check this:
# Generate the query vector
query_vector = embeddings.embed_query("What did the president say about Ketanji Brown Jackson")
print(query_vector)
# Perform a similarity search
docs = vector_store.similarity_search(
query="What did the president say about Ketanji Brown Jackson",
k=3,
search_type="similarity",
)
print(docs[0].page_content)
If the query_vector
is null or an empty array, then the issue might be with the embed_query
function. In this case, you might need to review the configuration of your OpenAIEmbeddings instance or the model you're using for embeddings.
On the other hand, if the query_vector
is not null or empty, then the problem could be with the similarity_search
method in the AzureSearch class. The similarity_search
method uses the embedding_function
to convert the query text into a vector. If this function isn't returning a valid vector, it could be causing the error. You might need to review the implementation of this function in the LangChain framework.
Please note that this is just one potential solution based on the information provided. If you find that this solution is correct and it's a bug that could impact other users, we encourage you to make a pull request to help improve the LangChain framework. If this solution doesn't align with your experience, it might be helpful to ensure you're using the latest version of LangChain.
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
Well the problem is in the vector parameter in each function that uses the embedding_function. I manage to create a working code using:
from typing import Any
from langchain.vectorstores.azuresearch import AzureSearch
from azure.search.documents.models import Vector
from langchain.docstore.document import Document
import numpy as np
from typing import Any, List, Optional, Tuple
class MyAzureSearch(AzureSearch):
def vector_search_with_score(
self, query: str, k: int = 4, filters: Optional[str] = None
) -> List[Tuple[Document, float]]:
results = self.client.search(
search_text=None,
vector=self.embedding_function(query),
top_k=k,
vector_fields=FIELDS_CONTENT_VECTOR,
select=[f"{FIELDS_ID},{FIELDS_CONTENT},{FIELDS_METADATA}"],
filter=filters,
)
# Convert results to Document objects
docs = [
(
Document(
page_content=result[FIELDS_CONTENT],
metadata=json.loads(result[FIELDS_METADATA]),
),
float(result["@search.score"]),
)
for result in results
]
return docs
vector_store: MyAzureSearch = MyAzureSearch(
azure_search_endpoint=AZURE_SEARCH_SERVICE_ENDPOINT,
azure_search_key=AZURE_SEARCH_API_KEY,
index_name="langchain-vector-demo",
embedding_function=embeddings.embed_query
)
# Perform a similarity search
docs = vector_store.similarity_search(
query="What did the president say about Ketanji Brown Jackson",
k=3,
search_type="similarity",
)
Answer generated by a π€
From your description, it appears that you're experiencing an InvalidVectorQuery
error when using the similarity_search
function in the AzureSearch vector store of the LangChain framework. This issue seems to be related to how the vector
parameter is being passed in each function that uses the embedding_function
.
Your workaround, where you've created a new class MyAzureSearch
that inherits from AzureSearch
and overrides the vector_search_with_score
function, is a clever solution. You've altered the way the vector
parameter is passed to the search
method, which seems to have resolved the issue.
As I understand it, you're directly passing the result of the embedding_function
to the vector
parameter, instead of creating a Vector
instance. This approach appears to be effective in your case.
However, please note that while this solution works for you, it might not be the best or only solution. There could be other factors at play that we're not aware of, and this solution might not work in all scenarios or for all users.
If you believe this is a bug that could impact other users and your solution could be a potential fix, you're encouraged to contribute to the LangChain project. You could consider making a pull request with your changes. This would allow the community to review and potentially incorporate your solution into the framework.
If you find that your experience contradicts this advice, it might be worth checking if you're using the latest version of LangChain. Sometimes, issues are resolved in newer versions, so upgrading could potentially solve your problem.
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
@DSgUY Don't you also need to import the fields for your example fix to work?
from langchain.vectorstores.azuresearch import FIELDS_ID, FIELDS_CONTENT, FIELDS_METADATA, FIELDS_CONTENT_VECTOR
import json
@DSgUY Don't you also need to import the fields for your example fix to work?
from langchain.vectorstores.azuresearch import FIELDS_ID, FIELDS_CONTENT, FIELDS_METADATA, FIELDS_CONTENT_VECTOR import json
yes, sorry. I was using:
FIELDS_ID = os.getenv('FIELDS_ID')
FIELDS_CONTENT = os.getenv('FIELDS_CONTENT')
FIELDS_CONTENT_VECTOR = os.getenv('FIELDS_CONTENT_VECTOR')
FIELDS_METADATA = os.getenv('FIELDS_METADATA')
but that seems better. Thanks!
Well the problem is in the vector parameter in each function that uses the embedding_function. I manage to create a working code using:
from typing import Any from langchain.vectorstores.azuresearch import AzureSearch from azure.search.documents.models import Vector from langchain.docstore.document import Document import numpy as np from typing import Any, List, Optional, Tuple
class MyAzureSearch(AzureSearch): def vector_search_with_score( self, query: str, k: int = 4, filters: Optional[str] = None ) -> List[Tuple[Document, float]]: results = self.client.search( search_text=None, vector=self.embedding_function(query), top_k=k, vector_fields=FIELDS_CONTENT_VECTOR, select=[f"{FIELDS_ID},{FIELDS_CONTENT},{FIELDS_METADATA}"], filter=filters, ) # Convert results to Document objects docs = [ ( Document( page_content=result[FIELDS_CONTENT], metadata=json.loads(result[FIELDS_METADATA]), ), float(result["@search.score"]), ) for result in results ] return docs
vector_store: MyAzureSearch = MyAzureSearch( azure_search_endpoint=AZURE_SEARCH_SERVICE_ENDPOINT, azure_search_key=AZURE_SEARCH_API_KEY, index_name="langchain-vector-demo", embedding_function=embeddings.embed_query )
# Perform a similarity search docs = vector_store.similarity_search( query="What did the president say about Ketanji Brown Jackson", k=3, search_type="similarity", )
Hi @DSgUY, Thanks for this solution. This works when the search type is similarity
but for hybrid
it is not working. Please let me know if something needs to be added to the implementation.
Hi, @DSgUY! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you encountered an "InvalidVectorQuery" error when querying the vector store in Azure Cognitive Search. It was suggested to check the embed_query
function and the embedding_function
in the AzureSearch class to ensure they are returning valid vectors. You provided a workaround by creating a new class MyAzureSearch
that overrides the vector_search_with_score
function. However, it was mentioned by JasonIsaac that the solution works for similarity
search type but not for hybrid
.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your contribution and we appreciate your understanding. Let us know if you have any further questions or concerns!
System Info
langchain==0.0.234, windows 10, azure-identity==1.13.0, Python 3.11.4
Who can help?
I manage to create an index in Azure Cognitive Search with id, content, _vectorcontent and metadata fields. I check that docs and chunks are not nulls. I'm getting and error when querying the vector store.
docs: azuresearch-langchain-example
Any fix for this? @hwchase17 @agola11 Regards
Information
Related Components
Reproduction
Embedding is working as I test:
When I'm trying to query with:
Error:
Expected behavior
I can't define is is the azure cognitive configuration index that i manually add or a bug in the code. Splitting and adding chunks to the vector store (Azure Cognitive Search) all where dont without any warning.