run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.31k stars 4.66k forks source link

[Question]: Missing async methods for RedisDocumentStore #13331

Open JulianOestreich90 opened 1 month ago

JulianOestreich90 commented 1 month ago

Question Validation

Question

I use a Redis Ingestion Pipeline like in the Ingestion Pipeline example:

async def ingest(directory: str, redis_client: Redis, schema: IndexSchema) -> TextNode:
    print("started ingest")
    reader = SimpleDirectoryReader(
        input_dir=directory, filename_as_id=True, file_metadata=get_meta
    )

    docs = reader.load_data()

    pipeline = IngestionPipeline(
        transformations=[
            SentenceSplitter(chunk_size=1024),
            QuestionsAnsweredExtractor(questions=3, llm=Settings.llm),
            KeywordExtractor(keywords=5, llm=Settings.llm),
            Settings.embed_model,
        ],
        docstore=RedisDocumentStore.from_redis_client(
            redis_client=redis_client, namespace="document_store"
        ),
        vector_store=RedisVectorStore(
            redis_client=redis_client,
            schema=schema,
        ),
        cache=IngestionCache(
            cache=RedisCache.from_redis_client(redis_client=redis_client),
            collection="redis_cache",
        ),
        docstore_strategy=DocstoreStrategy.UPSERTS,
    )

    nodes = await pipeline.arun(documents=docs)
    return nodes

However the RedisDocstore throws a NotImplementedError:

File "/Users/user/Documents/ai-papermaker/qa-module/venv-3.12/lib/python3.12/site-packages/llama_index/core/ingestion/pipeline.py", line 781, in _ahandle_upserts
    (await self.docstore.aget_all_document_hashes()).values()
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Documents/ai-papermaker/qa-module/venv-3.12/lib/python3.12/site-packages/llama_index/core/storage/docstore/keyval_docstore.py", line 577, in aget_all_document_hashes
    for doc_id in await self._kvstore.aget_all(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/Documents/ai-papermaker/qa-module/venv-3.12/lib/python3.12/site-packages/llama_index/storage/kvstore/redis/base.py", line 135, in aget_all
    raise NotImplementedError

What would be an Alternative? Do i really need to set up a different Database for the DocStore to support async calls?

dosubot[bot] commented 1 month ago

Hey there, @JulianOestreich90! Fancy seeing you here again. 😎 I'm diving into your issue and will be back with a detailed response soon! 🚀

logan-markewich commented 1 month ago

The alternative is someone should implement the async calls for this 😆 There's a PR open that just does fake-async, I've been meaning to edit it to actually implement the real async calls.

tylerhutcherson commented 1 month ago

@logan-markewich lets sync on this. RedisVL has an async class we could use!

dosubot[bot] commented 1 month ago

To effectively implement true asynchronous operations within the RedisDocumentStore, integrating RedisVL's async class is a practical approach. This integration will enable the RedisDocumentStore to support async calls, aligning with the async nature of your ingestion pipeline and eliminating the need for alternative databases or makeshift solutions. By adopting RedisVL's async capabilities, you can enhance the efficiency and scalability of your data processing workflows, ensuring that operations reliant on the RedisDocumentStore are performed in an asynchronous manner, thus streamlining your overall data handling strategy.

#### Details **Sources** - [llama-index-legacy/llama_index/legacy/ingestion/pipeline.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/ingestion/pipeline.py) - [llama-index-integrations/storage/kvstore/llama-index-storage-kvstore-redis/llama_index/storage/kvstore/redis/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/storage/kvstore/llama-index-storage-kvstore-redis/llama_index/storage/kvstore/redis/base.py) - [llama-index-legacy/llama_index/legacy/storage/docstore/keyval_docstore.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/storage/docstore/keyval_docstore.py) - [llama-index-core/llama_index/core/storage/docstore/keyval_docstore.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/storage/docstore/keyval_docstore.py) - [llama-index-legacy/tests/storage/docstore/test_redis_docstore.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/tests/storage/docstore/test_redis_docstore.py) - [llama-index-legacy/llama_index/legacy/storage/docstore/redis_docstore.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/storage/docstore/redis_docstore.py) - [llama-index-integrations/storage/docstore/llama-index-storage-docstore-redis/llama_index/storage/docstore/redis/base.py](https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/storage/docstore/llama-index-storage-docstore-redis/llama_index/storage/docstore/redis/base.py) - [llama-index-legacy/llama_index/legacy/storage/kvstore/redis_kvstore.py](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/storage/kvstore/redis_kvstore.py) - [llama-index-core/llama_index/core/ingestion/pipeline.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/ingestion/pipeline.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.