run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.76k stars 4.74k forks source link

[Bug]: #13904

Open upchunk opened 1 month ago

upchunk commented 1 month ago

Bug Description

Getting this RuntimeError: no running event loop error when running ingestion pipeline on FastAPI, the error are coming from llamaindex asyncio_run function (from llama_index.core.async_utils import asyncio_run)

Version

0.10.41

Steps to Reproduce

  1. Create a Index-Document endpoint on FastAPI to index document
  2. Use IngestionPipeline to index the document to pinecone Vector Store with Transformers as follow:
    1. SentenceSplitter
    2. KeywordExtractor
  3. Index documents with sizes of 1000 words ++, do it concurrently using aiohttp to the Index Document Endpoint
  4. The Error will frequently Showed up

Relevant Logs/Tracbacks

Task exception was never retrieved
future: <Task finished name='Task-87799' coro=<AsyncClient.aclose() done, defined at D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpx\_client.py:2011> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\llama_index\core\async_utils.py", line 29, in asyncio_run
    loop = asyncio.get_running_loop()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: no running event loop

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpx\_client.py", line 2018, in aclose
    await self._transport.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpx\_transports\default.py", line 385, in aclose
    await self._pool.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_async\connection_pool.py", line 313, in aclose
    await self._close_connections(closing_connections)
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_async\connection_pool.py", line 305, in _close_connections
    await connection.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_async\connection.py", line 171, in aclose
    await self._connection.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_async\http11.py", line 265, in aclose
    await self._network_stream.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_backends\anyio.py", line 55, in aclose
    await self._stream.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\anyio\streams\tls.py", line 202, in aclose
    await self.transport_stream.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 1191, in aclose
    self._transport.close()
  File "C:\Python311\Lib\asyncio\proactor_events.py", line 109, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "C:\Python311\Lib\asyncio\base_events.py", line 762, in call_soon
    self._check_closed()
  File "C:\Python311\Lib\asyncio\base_events.py", line 520, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Task exception was never retrieved
future: <Task finished name='Task-87800' coro=<AsyncClient.aclose() done, defined at D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpx\_client.py:2011> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\llama_index\core\async_utils.py", line 29, in asyncio_run
    loop = asyncio.get_running_loop()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: no running event loop

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpx\_client.py", line 2018, in aclose
    await self._transport.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpx\_transports\default.py", line 385, in aclose
    await self._pool.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_async\connection_pool.py", line 313, in aclose
    await self._close_connections(closing_connections)
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_async\connection_pool.py", line 305, in _close_connections
    await connection.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_async\connection.py", line 171, in aclose
    await self._connection.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_async\http11.py", line 265, in aclose
    await self._network_stream.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\httpcore\_backends\anyio.py", line 55, in aclose
    await self._stream.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\anyio\streams\tls.py", line 202, in aclose
    await self.transport_stream.aclose()
  File "D:\PROJECT\aidoc-llm-service\.venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 1191, in aclose
    self._transport.close()
  File "C:\Python311\Lib\asyncio\proactor_events.py", line 109, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "C:\Python311\Lib\asyncio\base_events.py", line 762, in call_soon
    self._check_closed()
  File "C:\Python311\Lib\asyncio\base_events.py", line 520, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
upchunk commented 1 month ago

image

dosubot[bot] commented 1 month ago

Hey @upchunk, I'm here to help you with any bugs, questions, or becoming a contributor. Let's tackle this together!

To resolve the RuntimeError: no running event loop error when running an ingestion pipeline on FastAPI using the llamaindex library's asyncio_run function, you need to ensure that an event loop is running. Here is an example of how you can modify your code to ensure an event loop is running:

import asyncio
from llama_index.core.ingestion.pipeline import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.embeddings.openai import OpenAIEmbedding

async def run_pipeline(documents):
    pipeline = IngestionPipeline(
        transformations=[
            SentenceSplitter(chunk_size=512, chunk_overlap=20),
            OpenAIEmbedding(),
        ],
    )
    nodes = await pipeline.arun(documents=documents)
    return nodes

# Ensure an event loop is running
documents = [...]  # Your documents here
asyncio.run(run_pipeline(documents))

Alternatively, you can use the asyncio_run utility function provided by the llamaindex library, which handles the event loop more effectively:

from llama_index.core.async_utils import asyncio_run

async def run_pipeline(documents):
    pipeline = IngestionPipeline(
        transformations=[
            SentenceSplitter(chunk_size=512, chunk_overlap=20),
            OpenAIEmbedding(),
        ],
    )
    nodes = await pipeline.arun(documents=documents)
    return nodes

# Ensure an event loop is running using llamaindex's asyncio_run
documents = [...]  # Your documents here
asyncio_run(run_pipeline(documents))

These approaches should help manage the event loop and resolve the RuntimeError: no running event loop error [1][2][3].

To continue talking to Dosu, mention @dosu.

upchunk commented 1 month ago

I have done it with and without the asyncio_run, and the problem still occurs on both methods. I think the problem is in the asyncio_run helper itself

dosubot[bot] commented 1 month ago

If the problem persists even with the asyncio_run helper, it might be due to how the event loop is being managed in your FastAPI application. Here are a few steps to ensure proper event loop management and avoid the RuntimeError: no running event loop error:

  1. Ensure the Event Loop is Running: Make sure that the event loop is properly managed within FastAPI. FastAPI should handle the event loop for you, but if you're running tasks manually, ensure they are correctly awaited.

  2. Use nest_asyncio: If you're encountering nested event loops, especially in environments like Jupyter notebooks, you can use nest_asyncio to allow nested async operations.

    import nest_asyncio
    nest_asyncio.apply()
  3. Check for Event Loop Closure: Ensure that the event loop is not being closed prematurely. This can happen if there are unhandled exceptions or if the loop is explicitly closed.

  4. Example Integration with FastAPI: Here is an example of how to integrate the IngestionPipeline with FastAPI, ensuring the event loop is managed correctly:

    from fastapi import FastAPI, HTTPException
    import asyncio
    from llama_index.core.ingestion.pipeline import IngestionPipeline
    from llama_index.core.node_parser import SentenceSplitter
    from llama_index.core.embeddings.openai import OpenAIEmbedding
    from llama_index.core.async_utils import asyncio_run
    
    app = FastAPI()
    
    async def run_pipeline(documents):
       pipeline = IngestionPipeline(
           transformations=[
               SentenceSplitter(chunk_size=512, chunk_overlap=20),
               OpenAIEmbedding(),
           ],
       )
       nodes = await pipeline.arun(documents=documents)
       return nodes
    
    @app.post("/ingest")
    async def ingest_documents(documents: list):
       try:
           nodes = await run_pipeline(documents)
           return {"status": "success", "nodes": nodes}
       except Exception as e:
           raise HTTPException(status_code=500, detail=str(e))
    
    if __name__ == "__main__":
       import nest_asyncio
       nest_asyncio.apply()
       import uvicorn
       uvicorn.run(app, host="0.0.0.0", port=8000)

This setup ensures that the event loop is properly managed and that nest_asyncio is applied to handle nested async operations. If the issue persists, it might be worth checking for any other parts of your code that might be closing the event loop or causing it to be in an invalid state.

@logan-markewich, this one's a bit out of my wheelhouse, could you step in and provide your guidance?

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

@upchunk weird, this code is in a try/except, how was it not caught? 🙃

In any case, if you are using fastapi, you should be using async entry points to the pipeline if you aren't already

pipeline.arun(documents=documents)

logan-markewich commented 1 month ago

I am not able to replicate actually

from fastapi import FastAPI
from llama_index.core import Document
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.extractors import KeywordExtractor

text = Document.example().text * 1000

app = FastAPI()

@app.get("/")
async def hello():
  return {"Hello": "World"}

@app.get("/test")
async def run_test():
  pipeline = IngestionPipeline(transformations=[
      SentenceSplitter(),
      KeywordExtractor(),
      OpenAIEmbedding(),
    ]
  )
  await pipeline.arun(documents=[Document(text=text), Document(text=text)])
  return "Complete"

if __name__ == "__main__":
  import uvicorn
  uvicorn.run(app)
upchunk commented 1 month ago

I do the request to the endpoint using aiohttp with 30.000 data sourced from pdf and stored on Cloud MongoDB Server, each data contains texts with the length of 1000 - 2000 words++ , limited by asynio.Semaphore(10) to prevent LLM Rate limit.

This takes 20 Hours total without Async and more if using Async.