langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.85k stars 15.36k forks source link

Azure Cognitive Search #7813

Closed miggytrinidad closed 1 year ago

miggytrinidad commented 1 year ago

System Info

...

Who can help?

No response

Information

Related Components

Reproduction

I am trying to use vector search as shown below:

`os.environ["AZURESEARCH_FIELDS_CONTENT_VECTOR"] = "section_summary_vector" os.environ["AZURESEARCH_FIELDS_CONTENT"] = "section_of_summary"

docs = vector_store.similarity_search( query="What did the president say about Ketanji Brown Jackson", k=3, engine = "gpt35turbo", search_type="similarity" ) print(docs[0].page_content)`

I am getting an error saying:

HttpResponseError: () Unknown field 'content_vector' in vector field list. Parameter name: vectorFields Code: Message: Unknown field 'content_vector' in vector field list. Parameter name: vectorFields

It seems that the custom vector field and is not being used by the function.

Expected behavior

Be able to customize the vector field to use in doing vector similarity search.

dosubot[bot] commented 1 year ago

Answer generated by a 🤖

Answer

I found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself.

Closed Issues

Azure Cognitive Search Vector Store doesn't apply search_kwargs when performing queries

This issue was closed by a pull request: Correct AzureSearch Vector Store not applying search_kwargs when searching


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

DSgUY commented 1 year ago

I think your problem also is when connecting to the index as you have _"Message: Unknown field 'content_vector' in vector field list."

What i do was: 1) Create the index in the azure cognitive portal 2) Add the fields used by default by langchain implementation. So the Azure Cognitive Search requires the creation of the following fields:

miggytrinidad commented 1 year ago

@DSgUY , thanks for your thoughts!

With what you're saying, I will not be able to have my own "field names" in Azure Cognitive Search? I should follow the id, content, and metadata field names that langchain expects?

Also, if there's multiple vector fields, does langchain supports that?

DSgUY commented 1 year ago

@DSgUY , thanks for your thoughts!

With what you're saying, I will not be able to have my own "field names" in Azure Cognitive Search? I should follow the id, content, and metadata field names that langchain expects?

I think you can change it using my last reply. Add an .env file with the expected keys and your custom field names. Load the enviroment with from dotenv import load_dotenv and then load_dotenv(). Your .env file is going to be something like this:

FIELDS_ID = "my_id"
FIELDS_CONTENT = "my_content"
FIELDS_CONTENT_VECTOR = "my_content_vector"
FIELDS_METADATA = "my_metadata"

Also, if there's multiple vector fields, does langchain supports that?

  • I'm don't know yet. I'm new to this implementation.
finnless commented 1 year ago

Also, if there's multiple vector fields, does langchain supports that?

This is being tracked in #6134 and #7154

dasiths commented 1 year ago

This is still an ongoing problem in 0.0.266. The vector_search_with_score function isn't correctly passing the FIELDS_CONTENT_VECTOR to the underlying client.search() function.

fortunkam commented 1 year ago

Looking at the code, the env variable names need to be ....

AZURESEARCH_FIELDS_ID AZURESEARCH_FIELDS_CONTENT AZURESEARCH_FIELDS_CONTENT_VECTOR AZURESEARCH_FIELDS_TAG

from https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/vectorstores/azuresearch.py#L45

rm2631 commented 11 months ago

Ensure that your environment variables are set up before importing the langchain module. In the latter, the 'vector_content' field is set as part of the import process. Hence, if you set up your variables in code, like I did for my test, it may not work.

rocapp commented 6 months ago

Looking at the code, the env variable names need to be ....

AZURESEARCH_FIELDS_ID AZURESEARCH_FIELDS_CONTENT AZURESEARCH_FIELDS_CONTENT_VECTOR AZURESEARCH_FIELDS_TAG

from https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/vectorstores/azuresearch.py#L45

Updated permalink

Manthata commented 4 months ago

I had a similar program and this comment helped a lot! setting the environment variables before importing langchain did help https://github.com/langchain-ai/langchain/issues/7813#issuecomment-1830669980