[Question]: how to use SimpleChatEngine.astream_chat

ton77v commented 2 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

llama-index 0.11.9

I'm trying to figure out how to use this method properly, for example:

import asyncio
import os

from dotenv import load_dotenv
from llama_index.llms.openai import OpenAI
from llama_index.core.chat_engine import SimpleChatEngine

load_dotenv()
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

llm = OpenAI(model="gpt-4o-mini", api_key=OPENAI_API_KEY)
chat_engine = SimpleChatEngine.from_defaults(llm=llm)

async def main():
    res = await chat_engine.astream_chat("Hi!")
    async for token in res.achat_stream:
        # RuntimeError: anext(): asynchronous generator is already running; however it prints out stuff
        print(token)

if __name__ == "__main__":
    asyncio.run(main())

This is the closest thing I got 😅 res.response_gen that works perfectly for sync just hangs here. Any ideas?

dosubot[bot] commented 2 months ago

There are a few points to address regarding the usage of SimpleChatEngine.astream_chat in llama-index 0.11.9:

Known Issue: The SimpleChatEngine.astream_chat method in llama-index 0.11.9 raises an exception indicating that async streaming chat is not supported:

async def astream_chat(
        self, messages: Sequence[ChatMessage], **kwargs: Any
    ) -> ChatResponseAsyncGen:
        raise Exception("Not supported")

^[1]^[2]

Correct Usage: To avoid the RuntimeError: anext(): asynchronous generator is already running error, ensure that you are not calling the asynchronous generator concurrently. Here is an example of how to use the astream_chat method correctly:

import asyncio
from llama_index.core.llms.types import ChatMessage
from llama_index.core.chat_engine.simple import SimpleChatEngine
from llama_index.core.llms.llm import LLM
from llama_index.core.memory import ChatMemoryBuffer

async def main():
    # Initialize the LLM instance
    llm = LLM(model="gpt-4-vision-preview", api_key="your_openai_api_key")

    # Initialize the memory buffer
    memory = ChatMemoryBuffer()

    # Initialize the SimpleChatEngine instance
    chat_engine = SimpleChatEngine(llm=llm, memory=memory, prefix_messages=[])

    # Define the chat message
    message = "Hello, how are you?"

    # Call the astream_chat method
    async for response in chat_engine.astream_chat([ChatMessage(content=message)]):
        print(response.message.content)

# Run the main function
asyncio.run(main())

^[3]^[4]^[5]

Similar Issues: There are similar issues and discussions that might be helpful:
- [Bug]: OpenAIAgent.astream_chat() does not work - This issue was resolved by upgrading to the latest version of llama-index-agent-openai ^[6].
- [Bug]: CondenseQuestionChatEngine astream_chat does not handle AsyncStreamingResponse - This issue was fixed by a pull request ^[7].

Given the known limitation in llama-index 0.11.9, you might want to check if there is an updated version of the library that addresses this issue or consider using the synchronous stream_chat method as an alternative.

To continue talking to Dosu, mention @dosu.

logan-markewich commented 2 months ago

res = await chat_engine.astream_chat
async for chunk in res.async_response_gen():
  print(chunk, end="", flush=True)

run-llama / llama_index

[Question]: how to use SimpleChatEngine.astream_chat #16024

Question Validation

Question