microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.64k stars 3.2k forks source link

.Net: Planners with Native Functions don't return complex object #3258

Closed ambrose-leung closed 8 months ago

ambrose-leung commented 11 months ago

Describe the bug For a Native Function returning a custom object (that is used in a plan), the return value containing the custom object is lost, instead only the string of the class name is returned. (I have a screenshot of where I think the problem may be occurring)

Changes need to be made in Plan.cs to account for complex objects.

This is a follow up of https://github.com/microsoft/semantic-kernel/issues/2081 with @dmytrostruk

To Reproduce I wrote a test case and put it in SequentialPlannerTests.cs - I had to add a project reference to Connectors.AI.OpenAI

    [TypeConverter(typeof(MyCustomTypeConverter2))]
    public class Employee 
    {
        public string Alias { get; set; }
    }

    public class MyCustomTypeConverter2 : TypeConverter
    {
        public override bool CanConvertFrom(ITypeDescriptorContext? context, Type sourceType) =>
            sourceType == typeof(string);
        public override object? ConvertFrom(ITypeDescriptorContext? context, System.Globalization.CultureInfo? culture, object value) =>
            new Employee { Alias = "blah" };
    }

    public class ReportsAndDocsPlugin
    {
        [SKFunction]
        public Employee GetReportOfAlias([Description("The aliases to get direct reports of"), SKName("input")] string alias)
        {
            Console.WriteLine($"GetReportOfAlias called with input: {alias}");
            return new Employee { Alias = "report1" };
        }
    }

    [Fact]
    public async Task PlanWithNativeFuncCanReturnComplexTypeAsync()
    {
        var builder = new KernelBuilder();
        builder.WithAzureChatCompletionService("gpt-35-turbo", "https://xxx.openai.azure.com/", "xxx");
        var kernel = builder.Build();
        kernel.ImportFunctions(new ReportsAndDocsPlugin(), "ReportsAndDocsPlugin");

        var planner = new SequentialPlanner(kernel);
        var ask = "Who are the reports of 'bar'?";

        Plan plan = await planner.CreatePlanAsync(ask);
        var result = await kernel.RunAsync(plan); //the object is not contained in 'result';
        var employee = result.GetValue<Employee>();
        Assert.Equal(expected: "report1", employee.Alias);
    }

Expected behavior Expecting result to contain the Employee (complex) object, but it only contains the literal string Microsoft.SemanticKernel.Planners.Sequential.UnitTests.SequentialPlannerTests+Employee

result.GetValue<Employee>(); throws an exception (because the value is just a string)

Screenshots I did some debugging and I think this is the source of the issue. In this screenshot (Plan.cs), result contains the complex object (in result.Value), but only the result.Context.Result (the string of the class name) is updated in the State image

Platform

Additional context

ambrose-leung commented 11 months ago

Just wanted to add my full scenario in the hopes that whatever PR comes out of this can address the full scenario

I want the SequentialPlanner to come up with a Plan that can take output (List<ComplexObject>) and feed it into a function that takes List<ComplexObject> as input.

For example:

    public class ReportsAndDocsPlugin
    {
        [SKFunction]
        public List<Employee> GetReportsOfAlias([Description("The aliases to get direct reports of"), SKName("input")] string alias)
        {
            //return list of Employee
        }

        [SKFunction]
        public List<Document> GetDocumentsOpenedByEmployee([Description("Documents opened by given employees"), SKName("input")] List<Employee> employees)
        {
            //return list of Document
        }
    }

When asked "What documents did the reports of 'foo' open?", the generated Plan should do the equivalent of:

List<Employee> reports = GetReportsOfAlias("foo");
return GetDocumentsOpenedByEmployee(reports);

Thanks for addressing this!

matthewbolanos commented 10 months ago

@alliscode, do you believe this has been fixed with the new handlebars planner?

Cotspheer commented 9 months ago

@matthewbolanos I'm currently using the new handlebars planner and I think I have a similar problem. It seems the whole information about the complex object is missing. From my point of view it is not quite a requirement that the receiving plugin has to accept the same complex object but the information about the complex object is crucial for the LLM to create proper steps. In my case I easily could provide another plugin that takes this structure and returns a stringified version for example but for that the LLM needs a way to "know" that it can't use the complex output directly. Besides that I'm pretty convinced that including the type information from the complex object would enhance the planners quality so the LLM can match the complex object types and use the proper plugins. One improvement would be to omit "Object" and print "List". Also it would be a performance enhancement as we then not have to additionally serialize objects to json and deserialize it in the target plugin.

Current Example:

### `SomeApp_Knowledge_LawLookupPlugin-Find`
Description: Find related laws. Useful to find law references and to get legal arguments. Combines similarity and full text search.
Inputs:
    - input: String - Context to find related laws for. This can be quotes (direct citation of a law) or a question or fragments of a context. (required)
    - collection: String - Collection to use for the search. Only set this if instructed to do so. (optional)
    - relevance: Double - The relevance score, from 0.0 to 1.0, where 1.0 means perfect match. Good values are 0.6 to 0.8. (optional)
    - limit: Int32 - The maximum number of relevant results the search should return. Only increase this if more information is requested. Defaults to 10. (optional)
Output: Object

Enhancement:

### `SomeApp_Knowledge_LawLookupPlugin-Find`
Description:
Inputs:
    - input: String
    - collection: String
    - relevance: Double
    - limit: Int32
Output: List<ComplexType>

### `SomeApp_AnotherPlugin-Do`
Description: Example
Inputs:
    - input: List<ComplexType>
Output: String

Awesome:

### `SomeApp_Knowledge_LawLookupPlugin-Find`
Description:
Inputs:
    - input: String
    - collection: String
    - relevance: Double
    - limit: Int32
Output:
    - Type: List<StructuredKnowledge>
        - Type: StructuredKnowledge
            - Description: Preserves the retrieved knowledge in a structured way so that it can be consumed by other APIs or A.I.-Agents.
            - KnowledgeId:
                - Description: The id within the knowledge base. For example the chunk-id within the vector space.
                - Type: String
            - Text:
                - Description: Actual knowledge chunk related to the context provided in the search.
                - Type: String
    - Description: A list of possible related laws to the context. Use the Text-Property to get its text.

### `SomeApp_AnotherPlugin-Do`
Description: Example
Inputs:
    - input: List<ComplexType>
Output: String

The annotated complex object:

[TypeConverter(typeof(StructuredKnowledgeConverter))]
[Description("Preserves the retrieved knowledge in a structured way so that it can be consumed by other APIs or A.I.-Agents.")]
public class StructuredKnowledge
{
    /// <summary>
    /// The id within the knowledge base.
    /// For example the chunk-id within the vector space.
    /// </summary>
    /// 
    [Description("The id within the knowledge base. For example the chunk-id within the vector space.")]
    public string KnowledgeId { get; set; } = string.Empty;

    /// <summary>
    /// Actual knowledge chunk related to the context provided in the search.
    /// </summary>
    [Description("Actual knowledge chunk related to the context provided in the search.")]
    public string Text { get; set; } = string.Empty;

    /// <summary>
    /// Any additional information that is not part of the text chunk but can help to get a context to the text.
    /// For example law articles, law paragraphs, etc. that are not part of the text chunk but can help the A.I. to take further actions or to shape the answer more helpful.
    /// </summary>
    [Description("Any additional information that is not part of the text chunk but can help to get a context to the text. For example law articles, law paragraphs, date and time etc. that are not part of the text chunk but can help to take further actions or to shape the answer more helpful.")]
    public string AdditionalBackgroundKnowledge { get; set; } = string.Empty;

    /// <summary>
    /// This is an identifier that has a meaning to the source of the knowledge. This can help a user to find the reference elsewhere.
    /// Example 82 I 306 => BGE-Leitentscheid reference or Art. 4 ZBG => Zivilgesetzbuch.
    /// </summary>
    [Description("This is an identifier that has a meaning to the source of the knowledge. Can help the user to find the source of the knowledge. Example 82 I 306, or Art. 4 ZGB.")]
    public string SourceId { get; set; } = string.Empty;

    /// <summary>
    /// Reference to the source of the knowledge.
    /// Can be a direct link, a filename or path.
    /// </summary>
    [Description("Reference to the source of the knowledge. Can be a direct URL, filepath or filename. Can help the user to find the source of the knowledge.")]
    public string Source { get; set; } = string.Empty;

    /// <summary>
    /// Additional metadata in the form of a JSON string.
    /// </summary>
    [Description("Additional metadata in the form of a JSON string.")]
    public string AdditionalMetadata { get; set; } = string.Empty;

    /// <summary>
    /// Returns the knowledge chunk without background information. 
    /// </summary>
    [Description("Returns the knowledge chunk without background information.")]
    public override string ToString()
    {
        return this.Text;
    }

    /// <summary>
    /// Implementation of <see cref="TypeConverter"/> for <see cref="StructuredKnowledge"/>.
    /// </summary>
    private sealed class StructuredKnowledgeConverter : TypeConverter
    {
        public override bool CanConvertFrom(ITypeDescriptorContext? context, Type sourceType)
        {
            return true;
        }

        /// <summary>
        /// This method is used to convert object from string to actual type. This will allow to pass object to
        /// method function which requires it.
        /// </summary>
        public override object? ConvertFrom(ITypeDescriptorContext? context, CultureInfo? culture, object value)
        {
            return JsonSerializer.Deserialize<StructuredKnowledge>((string)value);
        }

        /// <summary>
        /// This method is used to convert actual type to string representation, so it can be passed to AI
        /// for further processing.
        /// </summary>
        public override object? ConvertTo(ITypeDescriptorContext? context, CultureInfo? culture, object? value, Type destinationType)
        {
            return JsonSerializer.Serialize(value);
        }
    }
}

The kernel function:


    [KernelFunction, Description(_findFunctionDescription)]
    public async Task<List<StructuredKnowledge>> Find(
        Kernel kernel,
        [Description("Context to find related laws for. This can be quotes (direct citation of a law) or a question or fragments of a context.")] string input,
        [Description("Collection to use for the search. Only set this if instructed to do so."), DefaultValue(DefaultCollection)] string? collection = DefaultCollection,
        [Description("The relevance score, from 0.0 to 1.0, where 1.0 means perfect match. Good values are 0.6 to 0.8."), DefaultValue(DefaultRelevance)] double? relevance = DefaultRelevance,
        [Description($"The maximum number of relevant results the search should return. Only increase this if more information is requested. Defaults to {DefaultLimitForDescription}."), DefaultValue(DefaultLimit)] int? limit = DefaultLimit,
        ILoggerFactory? loggerFactory = default,
        CancellationToken cancellationToken = default
    )
    {
    }
banduki commented 9 months ago

@matthewbolanos Just wanting to offer a very strong upvote for everything in @Cotspheer's post.

teresaqhoang commented 8 months ago

Hey @Cotspheer,

Thanks for all the detail! Complex types should now work as expected in the Handlebars Planner after this fix went in: https://github.com/microsoft/semantic-kernel/pull/4804

Here's an example of how functions and their corresponding complex types are rendered in the CreatePlan prompt, where Complex Types and their entire JSON schemas will be defined fully before we list the function definitions

This is a snippet of the output from RunLocalDictionaryWithComplexTypesSampleAsync in Kernel Syntax Example 65).

## Complex types
Some helpers require arguments that are complex objects. The JSON schemas for these complex objects are defined below:

### DictionaryEntry:
{
  "type": "Object",
  "properties": {
    "Word": {
      "type": "String",
    },
    "Definition": {
      "type": "String",
    },
  }
}

## Custom helpers
Lastly, you have the following custom helpers to use.

### `ComplexParamsDictionaryPlugin-GetRandomEntry`
Description: Gets a random word from a dictionary of common words and their definitions.
Inputs:
Output: DictionaryEntry

### `ComplexParamsDictionaryPlugin-GetWord`
Description: Gets the word for a given dictionary entry.
Inputs:
    - entry: DictionaryEntry - Word to get definition for. (required)
Output: String
...

Please pull the latest changes from main and re-open this issue if it does not address your scenarios. Thanks!

cc: @banduki, @ambrose-leung