Closed gich2009 closed 8 months ago
π€
Hello,
Thank you for reporting this issue. Based on the information you've provided and the context from the LlamaIndex repository, it seems like the issue could be related to how the response is being generated and consumed in the stream_chat()
method.
In the StreamingAgentChatResponse
class, the response_gen
and async_response_gen
methods are responsible for generating the response. These methods use a queue to store the response tokens and yield them one by one. If the first one or two tokens are missing, it could be because the consumer of the response generator is not properly waiting for the tokens to be available in the queue. This could happen if the consumer is trying to get tokens from the queue before they are put in, resulting in missing tokens. You can find more details about this in a similar solved issue [Bug]: Incomplete Response when streaming from a chat engine.
Additionally, the REACT chat agent might be returning an additional inference block instead of the response from the completed REACT process due to the way the parse
method in the ReActOutputParser
class is implemented. If the keyword "Thought:" is not found in the output, the method assumes that the agent is directly outputting the answer and returns a ResponseReasoningStep
with the thought set to "I can answer without any tools." and the response set to the output. This could be the additional inference block that you are seeing. You can find more details about this in the llama_index/agent/react/output_parser.py file.
Unfortunately, I couldn't find any changes made to the stream_chat()
method in the LlamaIndex codebase after version 0.8.42. It would be helpful if you could provide more details about how you are using the stream_chat()
method and how the response generator is being consumed. This could help us identify the exact issue and suggest a more accurate solution.
Best, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Thanks @gich2009 for submitting this. I would like to triage this by first trying to reproduce the issue that you're running into.
Could you please provide more details on how your running the ReACT agent, or potentially just share test.py
here?
Additionally, have you seen our module guide (notebook) for using ReAct with Query Engine? It's a bit more involved then what you're doing here, as it constructs the ReActAgent
from a set of QueryEngineTools
.
I'm also running into the same issue. My code is as follows:
def index_chat():
try:
if request.method == 'POST':
cnx = cnxpool.get_connection()
cur = cnx.cursor()
logging.info("Received a POST request to /index-chat")
message: str = request.json['message']
documentId: str = request.json['documentId']
userId: str = request.json['userId']
print(f"Received message: {message}")
print(f"Received documentId: {documentId}")
print(f"Received userId: {userId}")
our_filters = MetadataFilters(filters=[ExactMatchFilter(key="documentId", value=documentId), ExactMatchFilter(key="userId", value=userId)])
# get previous messages from DB and pass to stream_chat
sql = "SELECT message, role FROM DocumentChat WHERE documentId = %s"
val = (documentId, )
cur.execute(sql, val)
result = cur.fetchall()
messages = []
for row in result:
if row[1] == "user":
messages.append(ChatMessage(role=MessageRole.USER, content=row[0]))
else:
messages.append(ChatMessage(role=MessageRole.ASSISTANT, content=row[0]))
# save to mysql DB
sql = "INSERT INTO DocumentChat (message, documentId, role) VALUES (%s, %s, %s)"
val = (message, documentId, "user")
cur.execute(sql, val)
cnx.commit()
cnx.close()
llm = OpenAI(temperature=0.1, model="gpt-4", api_key=openai.api_key)
service_context = ServiceContext.from_defaults(llm=llm, callback_manager=callback_manager)
vector_store = PineconeVectorStore(api_key=pinecone_api_key, index_name=pinecone_index, environment=pinecone_env, filters=our_filters)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store, storage_context=storage_context, service_context=service_context)
chat_engine = vector_index.as_chat_engine(chat_mode="react", filters=our_filters, verbose=True)
def event_stream():
stream_response = chat_engine.stream_chat(message, chat_history=messages)
for token in stream_response.response_gen:
print(f"Sending token: {token}")
yield token
cnx = cnxpool.get_connection()
cur = cnx.cursor()
sql = "INSERT INTO DocumentChat (message, documentId, role) VALUES (%s, %s, %s)"
val = (stream_response.response, documentId, "assistant")
cur.execute(sql, val)
cnx.commit()
cnx.close()
return Response(event_stream(), mimetype="text/event-stream")
except Exception as e:
logging.error(f"An error occurred: {e}")
I added a print statement to verify that the missing token is coming from the stream_response.response_gen
method
Thanks @cwysong85! Taking a closer look today to see what's happening here and how it can be resolved.
Okay, I've been trying to replicate the bug, ran into something though I don't know for sure if its what users are experiencing here.
A couple of notes:
Thought
, Action
, and Response
, then you should pass in verbose=True
in the as_chat_engine
method call of the VectorStoreIndex
.stream_chat
when getting the final response. This may or may not be related to the issue originally raised here.Here is the script that I've used to try to replicate the BUG experienced by @gich2009 and @cwysong85:
Note: I am running this script in root directory of the repo. Access to .../examples/data/paul_graham
is required.
react_example.py
import argparse
from dotenv import load_dotenv
from llama_index import SimpleDirectoryReader, VectorStoreIndex
from llama_index.indices.service_context import ServiceContext
from llama_index.llms import OpenAI
parser = argparse.ArgumentParser()
parser.add_argument(
"-s", "--streaming", help="streaming or regular chat", action="store_true"
)
def main(streaming: bool = False):
# set the LLM
load_dotenv()
llm = OpenAI(temperature=0.1, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)
# create a SimpleVectorStore
documents = SimpleDirectoryReader("../docs/examples/data/paul_graham").load_data()
vector_index = VectorStoreIndex.from_documents(documents)
# create ChatEngine
chat_engine = vector_index.as_chat_engine(chat_mode="react", verbose=True)
if not streaming:
# regular chat
message = "Hi, how are you?"
response = chat_engine.chat(message=message)
else:
# stream chat
message = "Hi, how are you?"
response = chat_engine.stream_chat(message=message)
response.print_response_stream()
print("\n")
print(f"final response:\n\n{response.response}")
if __name__ == "__main__":
args = parser.parse_args()
main(args.streaming)
To run react_example.py
you need a .env
OPENAI_API_KEY=<fill-in>
You can run the script in either regular
or streaming
chat mode, by using the option --streaming
(default is regular
mode).
For example, to run the script using stream_chat
python react_example.py --streaming
This yields
Response: Hello! I'm an AI assistant designed to help with various tasks. How can I assist you today?
'm doing well, thank you! How can I assist you today?
final response:
'm doing well, thank you! How can I assist you today?
Which is clearly different from the ReAct Response output. If running in regular mode then the output matches the final response:
Executing python react_example.py
give the following output:
Response: Hello! I'm an AI assistant designed to help with various tasks. How can I assist you today?
final response:
Hello! I'm an AI assistant designed to help with various tasks. How can I assist you today?
Alright added logging to the script above and observed that the ReActAgent
is not terminating when in stream_chat
mode after a Response
step is made. It makes an additional call to OpenAI with chat context:
[
...
{"role": "user", "content": "Hi, how are you?"},
{"role": "assistant", "content": "Response: Hello! I\'m an AI assistant designed to help with various tasks. How can I assist you today?"}
]
Alright added logging to the script above and observed that the
ReActAgent
is not terminating when instream_chat
mode after aResponse
step is made. It makes an additional call to OpenAI with chat context:[ ... {"role": "user", "content": "Hi, how are you?"}, {"role": "assistant", "content": "Response: Hello! I\'m an AI assistant designed to help with various tasks. How can I assist you today?"} ]
I've actually noticed this bug where the "Response" just kept looping the same content string over and over again to OpenAI until OpenAI rate limited the requests. It looked something like this:
Response: Hello! I\'m an AI assistant designed to help with various tasks. How can I assist you today?
Response: Response: Hello! I\'m an AI assistant designed to help with various tasks. How can I assist you today?
Response: Response: Response: Hello! I\'m an AI assistant designed to help with various tasks. How can I assist you today?
Response: Response: Response: Response: Hello! I\'m an AI assistant designed to help with various tasks. How can I assist you today?
etc etc
This bug would only occur in ReAct mode too. Is it possible that these two issues could be related?
@cwysong85 Yes, I believe that they are related. After investigating this issue for some time, we found that It occurs due to a faulty check on the stream for when the final response (or reasoning step) is about to be sent. If that check returns a false-negative (i.e., the Agent has the answer, but we failed to classify it as the case), then the ReAct agent will go through another iteration.
What's also happening here is that our desired outcome is to steam the final reasoning step, and not the entire ReAct thought/action/observation/... output. This is related, because we rely on the check on whether the stream is part of the final response, signalling the end of the ReAct execution. Ultimately, what I believe was happening here was a result of two things:
A PR is in now that should make this more consistent.
Thanks @nerdai for your help in solving this problem. I have been a bit unavailable to assist but I'm glad the issue is resolved. I will test out the fix and report any bugs found.
No problem @gich2009. π€
Bug Description
The response from the REACT chat agent is not being streamed properly. The agent seems to return an additional inference block instead of the response from the completed REACT process.
Version
<=0.8.45.post1
Steps to Reproduce
Run a react chat agent and stream the output.
Relevant Logs/Tracbacks