Closed miggytrinidad closed 1 year ago
Answer generated by a 🤖
I found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself.
This issue was closed by a pull request: Correct AzureSearch Vector Store not applying search_kwargs when searching
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
I think your problem also is when connecting to the index as you have _"Message: Unknown field 'content_vector' in vector field list."
What i do was: 1) Create the index in the azure cognitive portal 2) Add the fields used by default by langchain implementation. So the Azure Cognitive Search requires the creation of the following fields:
FIELDS_ID
: This field is used to store the ID of the document. Its type is SearchFieldDataType.String
, and it's the key and filterable. The default name for this field is "id".
FIELDS_CONTENT
: This field is used to store the content of the document. Its type is SearchFieldDataType.String
, and it's searchable and retrievable. The default name for this field is "content".
FIELDS_CONTENT_VECTOR
: This field is used to store the vector representation of the content. Its type is SearchFieldDataType.Collection(SearchFieldDataType.Single)
, and it's searchable. The default name for this field is "content_vector". The dimensions of this field are determined by the length of the embedding of the "Text". I use: "dimensions=1536"
FIELDS_METADATA
: This field is used to store the metadata of the document. Its type is SearchFieldDataType.String
, and it's searchable and retrievable. The default name for this field is "metadata".
The vector search configuration uses the Hierarchical Navigable Small World (HNSW) algorithm with the following parameters: m
= 4, efConstruction
= 400, efSearch
= 500. The similarity metric used is the cosine metric.
Hope it helps. Regards.
@DSgUY , thanks for your thoughts!
With what you're saying, I will not be able to have my own "field names" in Azure Cognitive Search? I should follow the id, content, and metadata field names that langchain expects?
Also, if there's multiple vector fields, does langchain supports that?
@DSgUY , thanks for your thoughts!
With what you're saying, I will not be able to have my own "field names" in Azure Cognitive Search? I should follow the id, content, and metadata field names that langchain expects?
I think you can change it using my last reply. Add an .env file with the expected keys and your custom field names. Load the enviroment with from dotenv import load_dotenv
and then load_dotenv()
. Your .env file is going to be something like this:
FIELDS_ID = "my_id"
FIELDS_CONTENT = "my_content"
FIELDS_CONTENT_VECTOR = "my_content_vector"
FIELDS_METADATA = "my_metadata"
Also, if there's multiple vector fields, does langchain supports that?
- I'm don't know yet. I'm new to this implementation.
Also, if there's multiple vector fields, does langchain supports that?
This is being tracked in #6134 and #7154
This is still an ongoing problem in 0.0.266. The vector_search_with_score
function isn't correctly passing the FIELDS_CONTENT_VECTOR
to the underlying client.search()
function.
Looking at the code, the env variable names need to be ....
AZURESEARCH_FIELDS_ID AZURESEARCH_FIELDS_CONTENT AZURESEARCH_FIELDS_CONTENT_VECTOR AZURESEARCH_FIELDS_TAG
Ensure that your environment variables are set up before importing the langchain module. In the latter, the 'vector_content' field is set as part of the import process. Hence, if you set up your variables in code, like I did for my test, it may not work.
Looking at the code, the env variable names need to be ....
AZURESEARCH_FIELDS_ID AZURESEARCH_FIELDS_CONTENT AZURESEARCH_FIELDS_CONTENT_VECTOR AZURESEARCH_FIELDS_TAG
I had a similar program and this comment helped a lot! setting the environment variables before importing langchain did help https://github.com/langchain-ai/langchain/issues/7813#issuecomment-1830669980
System Info
...
Who can help?
No response
Information
Related Components
Reproduction
I am trying to use vector search as shown below:
`os.environ["AZURESEARCH_FIELDS_CONTENT_VECTOR"] = "section_summary_vector" os.environ["AZURESEARCH_FIELDS_CONTENT"] = "section_of_summary"
docs = vector_store.similarity_search( query="What did the president say about Ketanji Brown Jackson", k=3, engine = "gpt35turbo", search_type="similarity" ) print(docs[0].page_content)`
I am getting an error saying:
HttpResponseError: () Unknown field 'content_vector' in vector field list. Parameter name: vectorFields Code: Message: Unknown field 'content_vector' in vector field list. Parameter name: vectorFields
It seems that the custom vector field and is not being used by the function.
Expected behavior
Be able to customize the vector field to use in doing vector similarity search.