Closed JulianOestreich90 closed 6 months ago
To stream only the final answer from the ReAct Query Engine in your FastAPI application, you'll need to adjust your streaming logic to filter out intermediate responses. The current implementation you've shared streams every piece of output generated by the chat_engine.stream_chat(input_text)
method. If the ReAct Query Engine or the method you're using to interact with it provides a way to distinguish between intermediate and final responses, you should leverage that.
For instance, if there's a flag or a specific response pattern that indicates a final response, you can modify the chat_stream
generator function to yield only when this condition is met. Unfortunately, without specific details on how the ReAct Query Engine differentiates between intermediate and final responses, I can't provide a precise code snippet.
However, conceptually, your chat_stream
function would look something like this:
async def chat_stream(chat_engine, input_text):
response = chat_engine.stream_chat(input_text)
for token in response.response_gen:
if token.is_final: # Assuming `is_final` is a way to identify the final response
yield token
This assumes response.response_gen
yields objects that have an is_final
attribute or similar. You'll need to replace is_final
with the actual mechanism provided by your chat engine for identifying the final response.
Remember, the key is to identify how the ReAct Query Engine signals the end of a response sequence and use that signal to control what you yield from the chat_stream
function. If the engine does not provide such a mechanism, you might need to implement additional logic based on the content of the responses or the sequence in which they are generated.
To continue the conversation, mention @dosu.
@JulianOestreich90 What LLM are you using? I'm not actually able to reproduce this
>>> from llama_index.core import VectorStoreIndex, Document
>>> index = VectorStoreIndex.from_documents([Document.example()])
>>> chat_engine = index.as_chat_engine(chat_mode="react")
>>> response = chat_engine.stream_chat("Tell me a fact about LLMs?")
>>> for token in response.response_gen:
... print(token, end="", flush=True)
...
LLMs are pre-trained on large amounts of publicly available data, making them a powerful tool for knowledge generation and reasoning.
@logan-markewich I am using multi-document agents like in the example in the documentation. I made some changes by using a ReAct Agent with Mistral-7B-Instruct-v0.2 on llama cpp instead of an OpenAIAgent. I pass the top_agent into the chat_stream()
fkt together with the input text.
The problem is solved. It just occured for specific queries.
@JulianOestreich90 Hi, I'm stuck at this problem, can you share how did you solve this problem?
Question Validation
Question
I want to stream responses of a ReAct Query Engine with FastApi and so far i am doing the following:
My problem is, that this returns all the intermediate Agent Results in the the response generator. How can i just stream the final answer?