run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.75k stars 5.27k forks source link

[Question]: how to use SimpleChatEngine.astream_chat #16024

Closed ton77v closed 2 months ago

ton77v commented 2 months ago

Question Validation

Question

llama-index 0.11.9

I'm trying to figure out how to use this method properly, for example:

import asyncio
import os

from dotenv import load_dotenv
from llama_index.llms.openai import OpenAI
from llama_index.core.chat_engine import SimpleChatEngine

load_dotenv()
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

llm = OpenAI(model="gpt-4o-mini", api_key=OPENAI_API_KEY)
chat_engine = SimpleChatEngine.from_defaults(llm=llm)

async def main():
    res = await chat_engine.astream_chat("Hi!")
    async for token in res.achat_stream:
        # RuntimeError: anext(): asynchronous generator is already running; however it prints out stuff
        print(token)

if __name__ == "__main__":
    asyncio.run(main())

This is the closest thing I got 😅 res.response_gen that works perfectly for sync just hangs here. Any ideas?

dosubot[bot] commented 2 months ago

There are a few points to address regarding the usage of SimpleChatEngine.astream_chat in llama-index 0.11.9:

  1. Known Issue: The SimpleChatEngine.astream_chat method in llama-index 0.11.9 raises an exception indicating that async streaming chat is not supported:

    async def astream_chat(
            self, messages: Sequence[ChatMessage], **kwargs: Any
        ) -> ChatResponseAsyncGen:
            raise Exception("Not supported")

    [1][2]

  2. Correct Usage: To avoid the RuntimeError: anext(): asynchronous generator is already running error, ensure that you are not calling the asynchronous generator concurrently. Here is an example of how to use the astream_chat method correctly:

    import asyncio
    from llama_index.core.llms.types import ChatMessage
    from llama_index.core.chat_engine.simple import SimpleChatEngine
    from llama_index.core.llms.llm import LLM
    from llama_index.core.memory import ChatMemoryBuffer
    
    async def main():
        # Initialize the LLM instance
        llm = LLM(model="gpt-4-vision-preview", api_key="your_openai_api_key")
    
        # Initialize the memory buffer
        memory = ChatMemoryBuffer()
    
        # Initialize the SimpleChatEngine instance
        chat_engine = SimpleChatEngine(llm=llm, memory=memory, prefix_messages=[])
    
        # Define the chat message
        message = "Hello, how are you?"
    
        # Call the astream_chat method
        async for response in chat_engine.astream_chat([ChatMessage(content=message)]):
            print(response.message.content)
    
    # Run the main function
    asyncio.run(main())

    [3][4][5]

  3. Similar Issues: There are similar issues and discussions that might be helpful:

Given the known limitation in llama-index 0.11.9, you might want to check if there is an updated version of the library that addresses this issue or consider using the synchronous stream_chat method as an alternative.

To continue talking to Dosu, mention @dosu.

logan-markewich commented 2 months ago
res = await chat_engine.astream_chat
async for chunk in res.async_response_gen():
  print(chunk, end="", flush=True)