langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.34k stars 15.47k forks source link

AzureSearch. Error when program terminates #27511

Open vladfeigin opened 1 month ago

vladfeigin commented 1 month ago

Checked other resources

Example Code

Failing code:

import os

loading environment variables from .env file

from dotenv import load_dotenv load_dotenv()

from langchain_community.vectorstores.azuresearch import AzureSearch from langchain_openai import AzureOpenAIEmbeddings

from azure.search.documents.indexes.models import ( ScoringProfile, SearchableField, SearchField, SearchFieldDataType, SimpleField, TextWeights, )

This module is responsible for integration with Azure Search and uses Langchain framework for this

It contains following functions:

search - search for similar documents in Azure Search. return top 5 results

ingest - gets as parameters a list of documents(chunks) and metadata per document and ingests them into Azure Search

Azure Search configuration

AZURE_SEARCH_SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
AZURE_SEARCH_API_KEY = os.getenv("AZURE_SEARCH_API_KEY")
AZURE_SEARCH_INDEX_NAME = os.getenv("AZURE_SEARCH_INDEX_NAME")

Azure OpenAI configuration

AZURE_OPENAI_KEY = os.getenv("AZURE_OPENAI_KEY")
AZURE_OPENAI_DEPLOYMENT = os.getenv("AZURE_OPENAI_DEPLOYMENT")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT") AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")

initialize AzureOpenAIEmbeddings

embeddings: AzureOpenAIEmbeddings = \ AzureOpenAIEmbeddings(azure_deployment=AZURE_OPENAI_DEPLOYMENT, openai_api_version=AZURE_OPENAI_API_VERSION, azure_endpoint=AZURE_OPENAI_ENDPOINT, api_key=AZURE_OPENAI_KEY)

define search index custom schema

fields = [ SimpleField( name="chunk_id", type=SearchFieldDataType.String, key=True, filterable=True, ), SimpleField( name="parent_id", type=SearchFieldDataType.String, key=True, filterable=True, ), SearchableField( name="chunk", type=SearchFieldDataType.String, searchable=True, ), SearchField( name="text_vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), searchable=True, vector_search_dimensions=len(embeddings.embed_query("Text")), vector_search_profile_name="myHnswProfile", ),

Additional field to store the title

SearchableField(
    name="title",
    type=SearchFieldDataType.String,
    searchable=True,
),

]

create Langchain AzureSearch object

vector_search: AzureSearch = \ AzureSearch(azure_search_endpoint=AZURE_SEARCH_SERVICE_ENDPOINT, azure_search_key=AZURE_SEARCH_API_KEY, index_name=AZURE_SEARCH_INDEX_NAME, embedding_function=embeddings.embed_query,

Configure max retries for the Azure client

additional_search_client_options={"retry_total": 3},
fields=fields,

)

ingest - gets as parameters a list of documents(chunks) and metadata per document and ingests them into Azure Search

TODO - implement async version of ingest

def ingest(documents: list, metadata):

check the input is valid list and non empty if not return exception

if not isinstance(documents, list) or not documents:
    raise ValueError("Input must be a non-empty list")
if not isinstance(metadata, list) or not metadata:
    raise ValueError("Metadata must be a non-empty list")
if len(documents) != len(metadata):
    raise ValueError("Documents and metadata must be of the same length")

# Ingest documents into Azure Search
vector_search.add_documents(documents, metadata)

def search(query: str, search_type='similarity', top_k=5):

check the input is valid string and non empty if not raise exception

if not isinstance(query, str) or not query:
    raise ValueError("Search query must be a non-empty string")
# Search for similar documents
docs = vector_search.similarity_search(query=query, k=top_k, search_type=search_type)
return docs[0].page_content  

docs = search("Waht is Microsoft's Fabric?", search_type='hybrid', top_k=5)

Error Message and Stack Trace (if applicable)

Exception ignored in: <function AzureSearch.del at 0x123c86020> Traceback (most recent call last): File "/Users/vladfeigin/myprojects/dai-demos/.venv/lib/python3.11/site-packages/langchain_community/vectorstores/azuresearch.py", line 393, in del File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/events.py", line 765, in get_event_loop_policy File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/events.py", line 758, in _init_event_loop_policy ImportError: sys.meta_path is None, Python is likely shutting down

Description

Running AzureSearch , hybrid search. The program executes properly but fails on termination

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 24.0.0: Tue Sep 24 23:39:07 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T6000 Python Version: 3.11.10 (main, Sep 7 2024, 01:03:31) [Clang 15.0.0 (clang-1500.3.9.4)]

Package Information

langchain_core: 0.2.41 langchain: 0.2.16 langchain_community: 0.2.16 langsmith: 0.1.136 langchain_openai: 0.1.23 langchain_text_splitters: 0.2.4 langchainhub: 0.1.21

Optional packages not installed

langgraph langserve

Other Dependencies

aiohttp: 3.10.5 async-timeout: Installed. No version info available. dataclasses-json: 0.6.7 httpx: 0.27.2 jsonpatch: 1.33 numpy: 1.26.4 openai: 1.44.0 orjson: 3.10.7 packaging: 24.1 pydantic: 2.9.0 PyYAML: 6.0.2 requests: 2.32.3 requests-toolbelt: 1.0.0 SQLAlchemy: 2.0.34 tenacity: 8.5.0 tiktoken: 0.7.0 types-requests: 2.32.0.20240907 typing-extensions: 4.12.2

slittlec commented 1 month ago

I'm also getting this error.

Tested on Mac and Windows. Seems to only happen when I return the AzureSearch object from a function. Have tried Python 11 and 12. Have tried azure-search-documents 14, 15 and 16b

all have this issue.


Exception ignored in: <function AzureSearch.del at 0x11ff76340> Traceback (most recent call last): File "azuresearch.py", line 393, in del File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/events.py", line 765, in get_event_loop_policy File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/events.py", line 758, in _init_event_loop_policy

ImportError: sys.meta_path is None, Python is likely shutting down


Semi related, I also noticed that when I added text, then searched, it would return nothing, as if it didn't await the text upload. There's something funky happening with the async is my guess.


If you instantiate twice, e.g. just do the below class creation, then do the exact same thing but do whatever you want with the second one, it removes the error. Horrible work around but at least it gets rid of the error...

AzureSearch(
    azure_search_endpoint=vector_store_address,
    azure_search_key=vector_store_password,
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)
vladfeigin commented 1 month ago

Thank you for sharing. I also tried with the different artifacts versions , the problem is not resolved neither.

khushiDesai commented 1 month ago

Hi @vladfeigin, I am Khushi, a 4th year student at UofT CS. I’m working with my teammates @anushak18, @ashvini8, and @ssumaiyaahmed, who are also 4th year students at UofT CS. We would like to take the initiative to work on this issue and contribute to LangChain. Please let us know if we can help!

vladfeigin commented 1 month ago

Great! Let me know what do you need? I can supply all the details if any is missing here in the issue description.

paprocki-r commented 2 weeks ago

same issue here I was following official tutorial and in agents it seems like the tool is not called https://python.langchain.com/docs/tutorials/qa_chat_history/ ,

later on, when I use azure retriever, it works for chains, fails for agent, dropping the error

voxoff79 commented 2 weeks ago

Same issue...

jamesrooti commented 1 week ago

Same issue