langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.25k stars 15.46k forks source link

langchain_unstructured.UnstructuredLoader cannot be created when using uvloop #26294

Open sfitts opened 2 months ago

sfitts commented 2 months ago

Checked other resources

Example Code

The following code raises a ValueError when run in "uvicorn" configured to use uvloop:

from langchain_unstructured import UnstructuredLoader

UnstructuredLoader("any/legal/file.txt")

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/vantiqservicesdk.py", line 189, in __process_message
    result = await self.__invoke(procedure_name, params, is_system_request)
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/vantiqservicesdk.py", line 277, in __invoke
    return await func(**params)
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/aimanager/src/main/python/ai_assistant.py", line 139, in load_index_entry
    documents = await self._load_from_content(content, content_type, metadata)
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/aimanager/src/main/python/ai_assistant.py", line 166, in _load_from_content
    documents = await load_from_content(content, content_type, metadata)
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/aimanager/src/main/python/content_loader.py", line 80, in load_from_content
    loader = UnstructuredLoader(file=tmp_file, content_type=content_type, mode='paged',
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/langchain_unstructured/document_loaders.py", line 118, in __init__
    self.client = client or UnstructuredClient(
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/unstructured_client/sdk.py", line 54, in __init__
    self.sdk_configuration = SDKConfiguration(
  File "<string>", line 13, in __init__
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/unstructured_client/sdkconfiguration.py", line 38, in __post_init__
    self._hooks = SDKHooks()
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/unstructured_client/_hooks/sdkhooks.py", line 15, in __init__
    init_hooks(self)
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/unstructured_client/_hooks/registration.py", line 28, in init_hooks
    split_pdf_hook = SplitPdfHook()
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/unstructured_client/_hooks/custom/split_pdf_hook.py", line 74, in __init__
    nest_asyncio.apply()
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/nest_asyncio.py", line 19, in apply
    _patch_loop(loop)
  File "/var/lib/jenkins/workspace/ag2rs-branch-tasklist/.gradle/python/lib/python3.10/site-packages/nest_asyncio.py", line 193, in _patch_loop
    raise ValueError('Can\'t patch loop of type %s' % type(loop))
ValueError: Can't patch loop of type <class 'uvloop.Loop'>

Description

I'm trying to use the langchain UnstructuredLoader in code hosted by uvicorn on Linux (where the default loop implementation is uvloop). I expect to be able to create the loader but can't because creation of the UnstructuredClient fails.

Note that the UnstructuredClient instance is not actually needed since we are not using the API to partition.

System Info

System Information
------------------
> OS:  Windows
> OS Version:  10.0.19045
> Python Version:  3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]

Package Information
-------------------
> langchain_core: 0.2.38
> langchain: 0.2.16
> langchain_community: 0.2.16
> langsmith: 0.1.117
> langchain_aws: 0.1.17
> langchain_elasticsearch: 0.2.2
> langchain_google_genai: 1.0.10
> langchain_huggingface: 0.0.3
> langchain_nvidia_ai_endpoints: 0.2.2
> langchain_openai: 0.1.23
> langchain_qdrant: 0.1.1
> langchain_text_splitters: 0.2.4
> langchain_unstructured: 0.1.2

Optional packages not installed
-------------------------------
> langgraph
> langserve

Other Dependencies
------------------
> aiohttp: 3.10.5
> async-timeout: Installed. No version info available.
> boto3: 1.34.162
> dataclasses-json: 0.6.7
> elasticsearch[vectorstore-mmr]: Installed. No version info available.
> google-generativeai: 0.7.2
> httpx: 0.27.2
> huggingface-hub: 0.24.6
> jsonpatch: 1.33
> numpy: 1.26.4
> openai: 1.44.0
> orjson: 3.10.7
> packaging: 24.1
> pillow: 10.4.0
> pydantic: 1.10.18
> PyYAML: 6.0.2
> qdrant-client: 1.11.1
> requests: 2.32.3
> sentence-transformers: 3.0.1
> SQLAlchemy: 2.0.34
> tenacity: 8.5.0
> tiktoken: 0.7.0
> tokenizers: 0.19.1
> transformers: 4.44.2
> typing-extensions: 4.12.2
> unstructured-client: 0.24.1
> unstructured[all-docs]: Installed. No version info available.
sfitts commented 2 months ago

A workaround is to pass a non-null value for the client (say a string). This avoids the code that creates the client and since it will never be used, you won't get a runtime error. This assumes that you don't want to use the API for partitioning.