microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.8k stars 3.24k forks source link

Python: history is not extended with tool call related messages automatically #9408

Open bbence84 opened 4 days ago

bbence84 commented 4 days ago

Describe the bug

Not exactly sure if this is a bug, but at least a gap. I have moved away from using the Planners and rely on "vanilla" function calling. Yet my tools that I defined can be chained after each other, and the LLM does come up with a plan how to chain them, and can call them after each other pretty well. Now my problem is that my tool functions usually return a JSON which contains i.e. the IDs of certain business objects that were created, and then the followup tool has a parameter that should use this (technical ID). The LLM response (the textual response) based on the function return value usually does not contain this ID (rightly so, as it's indeed a technical detail). The problem is that now followup functions have no clue about the ID, because the ChatHistory does not contain the tool calls.

Rerefencing issue #8071 and #8020.

To Reproduce

I tested this with a simpler example, as per recommendation the following: https://github.com/microsoft/semantic-kernel/blob/main/python/samples/concepts/filtering/auto_function_invoke_filters.py Also here printing the ChatHistory does not contain any tool call or tool call results:

    chat_hist_json = history.serialize()
    print(f"Chat history: {chat_hist_json}")    

Expected behavior

Chat history contains FunctionResultContent and FunctionCallContent automatically, as that is needed if tool functions are to be chained and one tool uses the result of a previous tool call.

Platform

Additional context Add any other context about the problem here.

moonbox3 commented 3 days ago

Hi @bbence84, thanks for continuing the conversation here. As I mentioned in our previous thread in #8071, all of the relevant calls are contained in the metadata's chat history. After each kernel invocation, you have access to the underlying content types (whether FunctionCallContent or FunctionResultContent -- even if not using a filter).

For the caller's ChatHistory, that's their responsibility to manage. The caller shall populate their chat history with the information required to either maintain context (or not).

Can you please help me understand what the gap is?