run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.37k stars 4.98k forks source link

[Question]: ReAct Agent's stream_chat method only returns the first step while using FastAPI and Streamlit #14924

Open jp-kh-kim opened 1 month ago

jp-kh-kim commented 1 month ago

Question Validation

Question

Hi, I'm working on llamaindex with FastAPI and streamlit.

I created a API, chat,

class ChatRequest(BaseModel):
    query: str
    chat_history: List[Dict[str, Any]]
    is_stream: bool

@app.post(path="/chat")
async def chat(request: ChatRequest):
    try:
        chat_history = [
            ChatMessage(role=message["role"], content=message["content"])
            for message in request.chat_history
            if message["role"] == "assistant"
        ][-1:]

        streaming_response = agent.stream_chat(
              message=request.query, chat_history=chat_history
          )
        return StreamingResponse(
              content=streaming_response.response_gen,
              media_type="text/event-stream",
          )

    except Exception as e:
        _logger.error(f"Error in chat: {e}")
        raise HTTPException(status_code=500, detail=str(e))

and it works with streamlit as below.

    with st.spinner("Give me a second,"):
        start_time = time.time()

        payload = json.dumps(
            {"query": query, "chat_history": chat_history, "is_stream": True},
        )
        with httpx.stream(
            method="POST",
            url=f"{backend_url}/chat",
            data=payload,
            headers=headers,
            timeout=60.0,
        ) as r:
            st.write_stream(stream=r.iter_text())

The problem is that It only returns the first step, tool choice. I tried define new generator, but it still doesn't work.

Is there any other approach for this problem?

dosubot[bot] commented 1 month ago

Hey there, @jp-kh-kim! I'm here to help you with any bugs, questions, or contributions. Let's tackle this issue together while we wait for a human maintainer.

The issue you're encountering where the stream_chat method only returns the first step might be due to how the streaming response is being handled in your FastAPI and Streamlit setup. Here are a few things to check and adjust:

  1. Ensure Proper Streaming in FastAPI:

    • Make sure that the stream_chat method in the agent object is correctly implemented to handle streaming responses. If it's not implemented, it will raise a NotImplementedError [1].
  2. Check the Streaming Response Handling:

    • Ensure that the streaming_response object is correctly yielding the response chunks. The streaming_response should be an iterable that yields individual completion objects [2].
  3. Adjust the FastAPI Endpoint:

    • Ensure that the FastAPI endpoint is correctly streaming the response. The StreamingResponse should be properly configured to handle the streaming data.

Here's an example of how you can adjust your FastAPI endpoint to ensure proper streaming:

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import List, Dict, Any
import httpx

app = FastAPI()

class ChatRequest(BaseModel):
    query: str
    chat_history: List[Dict[str, Any]]
    is_stream: bool

@app.post("/chat")
async def chat(request: ChatRequest):
    try:
        chat_history = [
            ChatMessage(role=message["role"], content=message["content"])
            for message in request.chat_history
        ]

        streaming_response = agent.stream_chat(
            message=request.query, chat_history=chat_history
        )

        async def event_generator():
            async for chunk in streaming_response.response_gen:
                yield f"data: {chunk}\n\n"

        return StreamingResponse(event_generator(), media_type="text/event-stream")

    except Exception as e:
        _logger.error(f"Error in chat: {e}")
        raise HTTPException(status_code=500, detail=str(e))
  1. Adjust the Streamlit Client:
    • Ensure that the Streamlit client is correctly handling the streamed data. The st.write_stream method should be used to handle the streamed data properly.

Here's an example of how you can adjust your Streamlit client:

import streamlit as st
import httpx
import json
import time

with st.spinner("Give me a second,"):
    start_time = time.time()

    payload = json.dumps(
        {"query": query, "chat_history": chat_history, "is_stream": True},
    )
    with httpx.stream(
        method="POST",
        url=f"{backend_url}/chat",
        data=payload,
        headers=headers,
        timeout=60.0,
    ) as r:
        for line in r.iter_text():
            if line:
                st.write(line)

By ensuring that both the FastAPI endpoint and the Streamlit client are correctly handling the streaming responses, you should be able to receive the complete streamed response rather than just the first step.

To continue talking to Dosu, mention @dosu.