microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
22.1k stars 3.3k forks source link

New Feature: Function calling w/ Context #9135

Open brandonh-msft opened 1 month ago

brandonh-msft commented 1 month ago

Scenario

Today, it seems the only things I can pass to a function call are JSON-serializable objects. However, in this scenario, I can send the LLM the image data via ImageContent message items but the LLM can only come back with, say, a filename that I surround this content with - not the content itself. This means that even if my Agent Function takes BinaryData as an input, the LLM doesn't provide it in its function call request. It can provide something like "Filename: X.png" which I could correlate back to the user's input, though, but I am unable to get back to that input to use it in the actual call to the agent.

So, enabling some way for me to get back to the original message which triggered the function call request is necessary for this scenario. I'm thinking something along the lines of an auto-injected parameter to the function, if the developer provides it on the signature (a la CancellationToken)

markwallace-microsoft commented 1 month ago

@brandonh-msft you could try create an Auto Function Invocation Filter, this will give you access to the function that is being called and also the associated Chat History, see https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/Filtering/AutoFunctionInvocationFiltering.cs, you could pass the ImageContent via KernelArguments for your specific function.

Let me know how this goes.

brandonh-msft commented 1 month ago

@brandonh-msft you could try create an Auto Function Invocation Filter, this will give you access to the function that is being called and also the associated Chat History, see https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/Filtering/AutoFunctionInvocationFiltering.cs, you could pass the ImageContent via KernelArguments for your specific function.

Let me know how this goes.

I was with you until "pass the ImageContent via KernelArguments for your specific function"

When observing context.ChatHistory, I see the message from the user w/ all the files attached as ImageContent items. So they're already there. This leaves me with a few questions:

  1. Where would I put these same items so they are accessible from the function that is the target of context.Function? If, as you suggest, I copy them over to context.Arguments then...
  2. How would I get to the items from within the Function? As far as I know, the only things I have access to are the parameter inputs to the function which are just primitives.
  3. What does this mean for scale & concurrency? e.g. if I'm adding them to kernel arguments and my Kernel is a singleton in the app, am I corrupting other requests?

It seems to me like we need a pattern akin to Azure Functions' FunctionContext as an always-available parameter on a function call. As it is right now, when SK gets the request from the LLM to make a tool call, it loses context of the conversation as a whole when making the call, leaving only what the LLM deems necessary for the function to execute; this seems to be the gap. Said another way, if I could add the AutoFunctionInvocationContext as a parameter on my Function so that I can get to everything that was in it, that would be great.

Thanks

brandonh-msft commented 1 month ago

Here's what I came up with, would love your thoughts:

Filter:

internal class ContextPopulateInvocationFilter : IAutoFunctionInvocationFilter
{
    private const string FunctionContextKey = "__functionContext";
    public Task OnAutoFunctionInvocationAsync(AutoFunctionInvocationContext context, Func<AutoFunctionInvocationContext, Task> next)
    {
        if (context.Arguments is not null)
        {
            context.Arguments[FunctionContextKey] = context;
        }

        return next(context);
    }
}

Function registration:

_kernel.CreateFunctionFromMethod(async (Guid conversationId, AutoFunctionInvocationContext __functionContext) => {
...
}, parameters: [new ("conversationId") { IsRequired = true, ParameterType = typeof(Guid), Description = "The ID of the Conversation as specified by the initial context to the system" }]);

I have access to the entirety of the invocation context in this manner. Would there be any unintended side effects of this approach?

The biggest caveat is that I am dependent upon .Arguments being not null, though it is declared as nullable. I cannot set its value to a new KernelArguments() because it's an init-only setter. So if it comes in null and the target function needs __functionContext, it's SOL.

brandonh-msft commented 1 month ago

The above approach resulted in me getting a serialization error on cancellationToken - so there must be one buried in the context somewhere. Trying to pull just Chat History off the arguments (since that's all I need) resulted in a serialization overflow. Still trying.

brandonh-msft commented 1 month ago

So, adding the whole Chat History resulted in recursive function invocations - I'm guessing because the history included items that requested tool calls. So, instead of passing the whole thing I simply pulled out all assistant and user messages that didn't have tool calls requested. This seems to have done the trick. If I need anything else from the Context, I'll have to manually add it to arguments as well.

If we could get Context to be serializable (not hit the cancellationToken issue) or create a purpose-built context for this, it seems that'd be much more ideal.