run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.37k stars 4.98k forks source link

How can I use astream_chat method with chatEngine, #15171

Open VedantR3907 opened 1 month ago

VedantR3907 commented 1 month ago

Question Validation

Question

How can I use astream_chat method with chatEngine, How can I use it with API (fastapi) to stream my response to there where it should be asynchronous.

dosubot[bot] commented 1 month ago

To use the astream_chat method with chatEngine and integrate it with an API using FastAPI for streaming responses asynchronously, follow the example below:

First, ensure you have FastAPI and the necessary dependencies installed:

pip install fastapi uvicorn openai llama-index

Then, create a FastAPI application that uses the astream_chat method to stream chat responses asynchronously. Here is an example:

from fastapi import FastAPI, WebSocket
from llama_index.llms.perplexity import Perplexity
from llama_index.core.llms import ChatMessage

app = FastAPI()

# Initialize your Perplexity instance
chat_engine = Perplexity(
    model="mistral-7b-instruct",
    api_key="your_perplexity_api_key"
)

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_text()
            messages = [ChatMessage(role="user", content=data)]

            async for response in chat_engine.astream_chat(messages):
                await websocket.send_text(response.message.content)
    except Exception as e:
        await websocket.close()

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

In this example:

  1. We create a FastAPI application.
  2. We define a WebSocket endpoint at /ws.
  3. We initialize the Perplexity instance with the necessary parameters.
  4. Inside the WebSocket endpoint, we receive messages from the client, use the astream_chat method to get streaming responses, and send these responses back to the client.

This setup allows you to stream chat responses asynchronously using FastAPI and the astream_chat method from the Perplexity class [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

VedantR3907 commented 1 month ago

How can I yield response from this function to the api

dosubot[bot] commented 1 month ago

To yield a response from the websocket_endpoint function to the API in your FastAPI application, you can use the aiohttp library to handle the WebSocket connection and yield to stream the response. Below is an example of how you can integrate the astream_chat method with FastAPI:

from fastapi import FastAPI, WebSocket
from typing import List
import json
import aiohttp
from llama_index.llms.perplexity import Perplexity
from llama_index.core.llms import ChatMessage, ChatResponse

app = FastAPI()

# Initialize your Perplexity instance
chat_engine = Perplexity(
    model="mistral-7b-instruct",
    api_key="your_perplexity_api_key"
)

async def astream_chat(messages: List[ChatMessage]):
    async def gen():
        async with aiohttp.ClientSession() as session:
            async with session.post(
                "https://api.perplexity.ai/v1/chat",
                json={"messages": [msg.dict() for msg in messages]},
                headers={"Authorization": f"Bearer {chat_engine.api_key}"}
            ) as response:
                response.raise_for_status()
                async for line in response.content:
                    if line.startswith(b"data: "):
                        response_data = line.replace(b"data: ", b"").decode('utf-8')
                        if response_data:
                            parsed_data = json.loads(response_data)
                            if "text" in parsed_data:
                                content_delta = parsed_data["text"]
                                yield ChatResponse(
                                    message=ChatMessage(
                                        role="assistant",
                                        content=content_delta,
                                    ),
                                    delta=content_delta,
                                    raw=parsed_data,
                                )
    return gen()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_text()
            messages = [ChatMessage(role="user", content=data)]
            async for response in astream_chat(messages):
                await websocket.send_text(response.message.content)
    except Exception as e:
        await websocket.close()

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

In this example:

  1. We create a FastAPI application.
  2. We define a WebSocket endpoint at /ws.
  3. We initialize the Perplexity instance with the necessary parameters.
  4. Inside the WebSocket endpoint, we receive messages from the client, use the astream_chat method to get streaming responses, and send these responses back to the client.

This setup allows you to stream chat responses asynchronously using FastAPI and the astream_chat method from the Perplexity class [1][2].

To continue talking to Dosu, mention @dosu.