run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
2.99k stars 286 forks source link

RuntimeError: Event loop is closed #157

Open ggjx22 opened 6 months ago

ggjx22 commented 6 months ago

Recently, I am running into RuntimeError while using LlamaParse. This is my job id: job_id ee9624b6-9a4a-4b4c-8229-05735fe807a2 Has anyone encounter the same and know what exactly is going on?

Error traceback messages:

Started parsing the file under job_id ee9624b6-9a4a-4b4c-8229-05735fe807a2
1it [00:00, 979.06it/s]
  0%|                                                                                                     | 0/1 [00:00<?, ?it/sT 
ask exception was never retrieved
future: <Task finished name='Task-91' coro=<AsyncClient.aclose() done, defined at D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpx\_client.py:1967> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpx\_client.py", line 1974, in aclose
    await self._transport.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpx\_transports\default.py", line 378, in aclose
    await self._pool.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpcore\_async\connection_pool.py", line 324, in aclose
    await connection.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpcore\_async\connection.py", line 173, in aclose
    await self._connection.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpcore\_async\http11.py", line 253, in aclose
    await self._network_stream.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpcore\_backends\anyio.py", line 54, in aclose
    await self._stream.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\anyio\streams\tls.py", line 193, in aclose
    await self.transport_stream.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 1261, in aclose
    self._transport.close()
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\asyncio\proactor_events.py", line 109, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 762, in call_soon
    self._check_closed()
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 520, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Task exception was never retrieved
future: <Task finished name='Task-92' coro=<AsyncClient.aclose() done, defined at D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpx\_client.py:1967> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpx\_client.py", line 1974, in aclose
    await self._transport.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpx\_transports\default.py", line 378, in aclose
    await self._pool.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpcore\_async\connection_pool.py", line 324, in aclose
    await connection.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpcore\_async\connection.py", line 173, in aclose
    await self._connection.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpcore\_async\http11.py", line 253, in aclose
    await self._network_stream.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\httpcore\_backends\anyio.py", line 54, in aclose
    await self._stream.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\anyio\streams\tls.py", line 193, in aclose
    await self.transport_stream.aclose()
  File "D:\OneDrive - Alfagomma Group\02_BUSINESS\01_SOFTWAREDEV\local\document-automation-wizard\.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 1261, in aclose
    self._transport.close()
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\asyncio\proactor_events.py", line 109, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 762, in call_soon
    self._check_closed()
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 520, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
logan-markewich commented 6 months ago

@ggjx22 how did you call llama-parse? Can you give some code?

And also, what version are you on? I would try updating pip install -U llama_parse

ggjx22 commented 6 months ago

I am reading the .pdf files directly from azure blob storage.

def get_file_extractor(llama_cloud_api_key):
    # instantiate the parser
    parser = LlamaParse(
        api_key=llama_cloud_api_key,
        result_type='markdown',
        parsing_instruction=get_parser_instructions()
    )

    file_extractor = {'.pdf': parser}

    return file_extractor

def get_azure_blob_reader(container_name, container_blob_name, connection_string, blob_name, file_extractor):
    return AzStorageBlobReader(
        container_name=f'{container_name}/{container_blob_name}',
        connection_string=connection_string,
        blob=blob_name,
        file_extractor=file_extractor
    )

def parse_file_from_az_blob(container_name, container_blob_name, connection_string, blob_name):
    # create file extractor
    file_extractor = get_file_extractor(llama_cloud_api_key=LLAMA_CLOUD_API_KEY)

    # get the azure blob reader
    az_blob_reader = get_azure_blob_reader(
        container_name=container_name,
        container_blob_name=container_blob_name,
        connection_string=connection_string,
        blob_name=blob_name,
        file_extractor=file_extractor
    )

    # load data from blob
    try:
        document = az_blob_reader.load_data()
        return document

    except Exception as error:
        print(f'ERROR: An error has occured while parsing the file. {error}')

This is how I am calling the functions:

document = llama.parse_file_from_az_blob(
    container_name=storage_container_name,
    container_blob_name=container_blob_name,
    connection_string=connection_string,
    blob_name=blob_name
)
ggjx22 commented 6 months ago

I am on version 0.4.1 but the RuntimeError is still happening. My latest parsing job id that is causing the error is job_id a943c84f-4d98-4ec5-aa8a-794c031a4c37

logan-markewich commented 6 months ago

I have a feeling its maybe failing to load from azure, and killing the async loop

Does it work if you remove the file extractor?

ggjx22 commented 6 months ago

Currently I am testing with 9 pdf files in my blob storage. The RuntimeError issue does not appear if file_extractor is not provided to AzStorageReader. When LlamaParse is provided as seen in issue above, I will encounter around 2 to 3 RuntimeError in a loop.