SlapDrone commented 1 year ago

Feature Description

Hey folks,

What do you think about having the new native Agent classes able to yield/stream all their steps (function calls, thoughts, observations, etc) rather than just the final output? I've been digging through the source code after this was mentioned in passing in the docs, but I don't see this implemented yet.

A colleague and I implemented a similar thing in a PR for Langchain, so this is definitely something we could either help the established devs with if it is already in the pipeline here, or take the lead on if not.

Reason

I don't think there's anything stopping this structurally, it just requires modifying some internals.

The places that would need adapted would be the stream_chat method for example here:

https://github.com/jerryjliu/llama_index/blob/be0ded30701f45bd097b30a00fa93d2cdf06a592/llama_index/agent/openai_agent.py#L141

And (potentially?) the StreamingAgentChatResponse here.

https://github.com/jerryjliu/llama_index/blob/be0ded30701f45bd097b30a00fa93d2cdf06a592/llama_index/chat_engine/types.py#L27

Value of Feature

This is super valuable to us because it's generally useful to be able to hook into the whole "reasoning" process, both for debugging and for transparency in a chat interface. If only the final answer is revealed and it happens to be wrong, it is harder to understand where the problem lies. When talking to an agent in an interactive setting, we can also see and potentially interrupt and guide its trajectory earlier and with finer detail if we can resolve each step. Imagine a chat where one can elect to see/sort/modify function calls / tool outputs / intermediate "thoughts".

logan-markewich commented 1 year ago

Definitely would be a good feature.

Originally we had the stream_chat function return a generator of generators for each step, but tbh it made the interface pretty complicated, especially compared to the existing streaming options in the library

Since the OpenAI agent doesn't have any internal reasoning really (since all the reasoning happens in the function calling api), you can actually see the decisions in the chat history (i.e. which tools it called), as well as a list of all tool sources (i.e. tool inputs/outputs) in the response.sources. This has a list of ToolOutput objects which contain the raw tool inputs and outputs

However for the react agent, this is a different story, looks like it wasn't updated to track tool sources yet

SlapDrone commented 1 year ago

Hey @logan-markewich, thanks for the insight. Yeah I can understand the nested generator UX being a bit janky, I tried something similar this evening and I can totally understand why it should not be the default.

I had a closer look at the code following your comments and I see now (if i'm understanding correctly) that the chat_history is updated in a separate thread running write_response_to_history, which itself uses streaming. With the current interface one can retrieve the function/tool call details as you point out, but it seems only after the threads have finished.

Assuming stream_chat is to be untouched then, off the top of my head some possible options which come to mind are:

(Re-)implement the nested generator version as another advanced method for this use case without touching stream_chat (slight disadvantage for maintainability, although some refactoring could make this and the current stream_chat rely on largely the same building blocks)
- One way could be to introduce a method which wraps stream_chat, but intercepts the chat_history while the individual function call messages are being streamed into it from inside threads, and yield from these.

Implement a callback inside the StreamingAgentChatResponse something like this:


@dataclass
class StreamingAgentChatResponse:
"""Streaming chat response to user and writing to chat history."""
# ...
callback: Optional[Callable[[Any], None]] = None

def write_response_to_history(self, memory: BaseMemory) -> None:
  # ...
  for chat in self.chat_stream:
      # ...
      if self.callback is not None:
          self.callback(chat.delta)

... in application logic

def send_to_chatbox(output):

...

chat_stream_response = StreamingAgentChatResponse( chat_stream=self._llm.stream_chat(all_messages, functions=functions), callback=send_to_chatbox, )


(After writing it out this second idea feels convoluted, as the streaming of the final response and the intermediate steps are handled completely differently, and one would also have to make such a callback aware of the instance of the `StreamingAgentChatResponse` it was called from to distinguish different "parent" messages/function calls).

Would be very happy to hear your thoughts on what an elegant solution might look like for you all.

logan-markewich commented 1 year ago

Yea both approaches make some sense, but yea the second one could be complicated ha!

I actually don't mind the first idea though. Although I'm also hesitant to add more functions to the interface, because right now the main four (chat/achat/stream_chat/astream_chat) are all mostly duplicates of each-other.

Tbh, if you are interested in contributing here and working on this, there's a few issues/tech debt that I'd love to solve and just haven't had time to look at:

major code duplication across 4 functions (and possibly 5-6 with these suggestions!)
callbacks un-aware of parents (and also not quite working for async, ha!)

I think if either of those are solved, then adding intermediate responses should be a little more straightforward/less scary 💪🏻

SlapDrone commented 1 year ago

Just checking in to say @Lacico and I have been working on a PR to address the tech debt / refactoring point - useful exercise to learn the nitty gritty of the internals anyway - will share it with you soon once we've tested it a bit more.

logan-markewich commented 1 year ago

Amazing @SlapDrone thanks a ton for tackling some of this!! ❤️🦙

dosubot[bot] commented 1 year ago

Hi, @SlapDrone! I'm Dosu, and I'm here to help the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you requested a new feature that would allow the native Agent classes to yield/stream all their steps instead of just the final output. Logan-markewich suggested that the existing functionality can be used to retrieve the function/tool call details, but only after the threads have finished. You proposed two possible options to implement this feature, and Logan-markewich is open to the first idea. Currently, SlapDrone and Lacico are working on a PR to address the tech debt and refactoring points.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to LlamaIndex!

run-llama / llama_index

[Feature Request]: Agent Step Iterator #6985

Feature Description

Reason

Value of Feature

... in application logic

...