Closed SlapDrone closed 12 months ago
Definitely would be a good feature.
Originally we had the stream_chat function return a generator of generators for each step, but tbh it made the interface pretty complicated, especially compared to the existing streaming options in the library
Since the OpenAI agent doesn't have any internal reasoning really (since all the reasoning happens in the function calling api), you can actually see the decisions in the chat history (i.e. which tools it called), as well as a list of all tool sources (i.e. tool inputs/outputs) in the response.sources. This has a list of ToolOutput
objects which contain the raw tool inputs and outputs
However for the react agent, this is a different story, looks like it wasn't updated to track tool sources yet
Hey @logan-markewich, thanks for the insight. Yeah I can understand the nested generator UX being a bit janky, I tried something similar this evening and I can totally understand why it should not be the default.
I had a closer look at the code following your comments and I see now (if i'm understanding correctly) that the chat_history
is updated in a separate thread running write_response_to_history
, which itself uses streaming. With the current interface one can retrieve the function/tool call details as you point out, but it seems only after the threads have finished.
Assuming stream_chat
is to be untouched then, off the top of my head some possible options which come to mind are:
stream_chat
(slight disadvantage for maintainability, although some refactoring could make this and the current stream_chat
rely on largely the same building blocks)
stream_chat
, but intercepts the chat_history
while the individual function call messages are being streamed into it from inside threads, and yield from these.Implement a callback inside the StreamingAgentChatResponse
something like this:
@dataclass
class StreamingAgentChatResponse:
"""Streaming chat response to user and writing to chat history."""
# ...
callback: Optional[Callable[[Any], None]] = None
def write_response_to_history(self, memory: BaseMemory) -> None:
# ...
for chat in self.chat_stream:
# ...
if self.callback is not None:
self.callback(chat.delta)
def send_to_chatbox(output):
chat_stream_response = StreamingAgentChatResponse( chat_stream=self._llm.stream_chat(all_messages, functions=functions), callback=send_to_chatbox, )
(After writing it out this second idea feels convoluted, as the streaming of the final response and the intermediate steps are handled completely differently, and one would also have to make such a callback aware of the instance of the `StreamingAgentChatResponse` it was called from to distinguish different "parent" messages/function calls).
Would be very happy to hear your thoughts on what an elegant solution might look like for you all.
Yea both approaches make some sense, but yea the second one could be complicated ha!
I actually don't mind the first idea though. Although I'm also hesitant to add more functions to the interface, because right now the main four (chat/achat/stream_chat/astream_chat) are all mostly duplicates of each-other.
Tbh, if you are interested in contributing here and working on this, there's a few issues/tech debt that I'd love to solve and just haven't had time to look at:
I think if either of those are solved, then adding intermediate responses should be a little more straightforward/less scary 💪🏻
Just checking in to say @Lacico and I have been working on a PR to address the tech debt / refactoring point - useful exercise to learn the nitty gritty of the internals anyway - will share it with you soon once we've tested it a bit more.
Amazing @SlapDrone thanks a ton for tackling some of this!! ❤️🦙
Hi, @SlapDrone! I'm Dosu, and I'm here to help the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, you requested a new feature that would allow the native Agent classes to yield/stream all their steps instead of just the final output. Logan-markewich suggested that the existing functionality can be used to retrieve the function/tool call details, but only after the threads have finished. You proposed two possible options to implement this feature, and Logan-markewich is open to the first idea. Currently, SlapDrone and Lacico are working on a PR to address the tech debt and refactoring points.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your contribution to LlamaIndex!
Feature Description
Hey folks,
What do you think about having the new native Agent classes able to yield/stream all their steps (function calls, thoughts, observations, etc) rather than just the final output? I've been digging through the source code after this was mentioned in passing in the docs, but I don't see this implemented yet.
A colleague and I implemented a similar thing in a PR for Langchain, so this is definitely something we could either help the established devs with if it is already in the pipeline here, or take the lead on if not.
Reason
I don't think there's anything stopping this structurally, it just requires modifying some internals.
The places that would need adapted would be the
stream_chat
method for example here:https://github.com/jerryjliu/llama_index/blob/be0ded30701f45bd097b30a00fa93d2cdf06a592/llama_index/agent/openai_agent.py#L141
And (potentially?) the
StreamingAgentChatResponse
here.https://github.com/jerryjliu/llama_index/blob/be0ded30701f45bd097b30a00fa93d2cdf06a592/llama_index/chat_engine/types.py#L27
Value of Feature
This is super valuable to us because it's generally useful to be able to hook into the whole "reasoning" process, both for debugging and for transparency in a chat interface. If only the final answer is revealed and it happens to be wrong, it is harder to understand where the problem lies. When talking to an agent in an interactive setting, we can also see and potentially interrupt and guide its trajectory earlier and with finer detail if we can resolve each step. Imagine a chat where one can elect to see/sort/modify function calls / tool outputs / intermediate "thoughts".