[Bug]: VectorStoreIndex.from_documents() with Elasticsearch storage context does not terminate asyncio threads

Bug Description

My data ingestion tool is running as a web-service and data ingestion is triggered occasionally. I noticed that the number of asyncio_* threads is increasing with each VectorStoreIndex.from_documents() call. The storage_context is ElasticsearchStore.

After couple of index creation iterationsthreading.enumerate() returns:

Currently running threads: MainThread, Thread-23, asyncio_0, asyncio_1, asyncio_2, asyncio_0, asyncio_1, asyncio_0, asyncio_1, asyncio_2, asyncio_0, asyncio_1, Thread-41 (process_request_thread), asyncio_0, asyncio_1, asyncio_2

Is this a know issue? Can it be fixed?

Elasticsearch lib version is 8.12.1 Llama Index version is 0.10.20

Version

0.10.20

Steps to Reproduce

On a webserver (e.g. flask): 1) Create an endpoint that creates index from Elasticsearch storage context with VectorStoreIndex.from_documents(docs, storage_context=elastic_store, use_async=True) 2) Add print(", ".join([t.name for t in threading.enumerate() if t.startswith("asyncio")]) to the end of endpoint execution 3) Trigger webservice endpoint a couple of times 4) Check console output to see the last string of the print output with the list of active threads

Relevant Logs/Tracbacks

No response

The issue you're encountering with the increasing number of asyncio_* threads is likely due to the way asynchronous tasks are being handled within your web service, especially when using blocking calls like asyncio.get_event_loop().run_until_complete() in a synchronous environment. This can lead to resource exhaustion over time as new event loops and threads may be created for each operation.

To address this, consider refactoring your web service to fully embrace asynchronous operations. This involves using async def for your endpoint functions and directly awaiting the asynchronous operations (e.g., elasticsearch_store.async_add(documents)) without resorting to blocking calls. This approach is supported by modern Python web frameworks such as FastAPI, Starlette, or Sanic, which allow for asynchronous request handlers.

Here's a simplified example for a FastAPI application:

from fastapi import FastAPI, HTTPException
from your_application import ElasticsearchStore  # Adjust the import path as necessary
from typing import List

app = FastAPI()
elasticsearch_store = ElasticsearchStore(...)  # Initialize your ElasticsearchStore

@app.post("/add_documents")
async def add_documents(documents: List[YourDocumentType]):  # Replace YourDocumentType with your actual document type
    try:
        ids = await elasticsearch_store.async_add(documents)
        return {"ids": ids}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This change ensures that your web service properly utilizes asynchronous I/O operations, reducing the overhead associated with thread management and potentially resolving the issue with the increasing number of asyncio_* threads.

Regarding the VectorStoreIndex.from_documents() method in the LlamaIndex library, it's important to note that asynchronous operations, when handled correctly, do not inherently create new asyncio threads. Instead, they schedule tasks on the event loop, allowing for concurrent execution without the overhead of thread creation. Ensure that your usage of LlamaIndex or any other library follows this asynchronous pattern to avoid similar issues.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

run-llama / llama_index