Closed stephentoub closed 11 months ago
Adding also a reference to this issue #1298 which the approaches above might resolve.
Removing this for V1 RC1 as this requires prioritization and design
@stephentoub, when we meet later today, we should also discuss the comments you made during the last design review along the lines of... do we even need SKContext if the kernel is the "runtime."
@stephentoub, we'll likely need your help revisiting this issue to see what else is necessary to complete it now that several things have changed since this issue was created.
This task can be closed because everything here has already been completed within the scope of other tasks/issues.
SKContext
represents ambient information that’s meant to be available to functions, like cancellation token, memory, and variables, and it serves as the vehicle by which that information is passed from one function to the next in a plan as well as out to any consumer of a function. However, the shape ofSKFunction
forcesSKContext
to be inserted before and after every function invocation, such that functions are limited by whatSKContext
can represent and store. All inputs need to be stored as strings and any output needs to be stored as strings. Further, functions themselves are responsible for interpreting those provided strings, which they may not have the relevant context to do, and are responsible for producing strings that others can understand. Fidelity of type is lost before that loss is actually necessary.https://github.com/microsoft/semantic-kernel/pull/1195 takes several steps towards improving that, by allowing methods to be defined with arbitrary signatures (e.g. rather than being able to accept at most one parameter of type string, a function can accept any number of arguments of non-string types), pushing any logic for extracting data from the
SKContext
or putting data back into theSKContext
out of the function's implementation. A function author therefore no longer needs to know anything about anSKContext
; they write their method signature accepting the arguments they want and returning the data they want, as they would in any .NET application model, and it’s up to the caller to handle that appropriately. In this PR, it’s the SKFunction as the caller that then handles the mapping of inputs from SKContext to function arguments and from function results back to the SKContext, performing string/object conversions via TypeConverter.But to support the kinds of things developers want and need to do, that is both needed progress and insufficient. Limiting function arguments and results to only what
SKContext
’s variables can store as strings (or be simply translated to/from strings that can be stored) still means that certain desirable shapes can’t be expressed, and forcing all inputs and outputs to go throughSKContext
limits how data can flow as well as how precisely the data can be represented. For example, a function should be definable to stream responses, in which case it’s return type should really be defined as something likeIAsyncEnumerable<string>
, with a native function able to use async/await/yield to easily implement streaming, and a semantic function able to implicitly stream the result from the LLM. Such a result should not be forced back into a string to be stored into theSKContext
, as doing so obviates any benefits of streaming: the results would all need to be aggregated into a single string before returning the result to the caller. Or, for example, a caller should be able to populate an SKContext with data from the environment that's available in its original object form (e.g. a reference to a UI control), and a native function should be able to access that object.I suggest an approach something like the following:
Remove all restrictions on what types functions can accept and return. Importing a skill function or defining a semantic function should work regardless of the number or types of inputs and outputs. Code can invoke a function, whether directly or indirectly via a kernel (if we want to wrap additional pre/post processing around its invocation), and the exact .NET objects passed as input are propagated into the function as its arguments, and the exact .NET object passed as output is propagated out to the caller as that same object. No translation is forced when there's no incompatibility.
Change
SKContext
’s variables to support arbitrarySystem.Object
values instead of onlySystem.String
values (TrustAwareString
, assuming it remains a required concept, either becomes just another object that can be stored, and all objects that are stored without the wrapper are implicitly untrusted, orTrustAwareString
becomesTrustAwareObject
so that arbitrary objects can also be marked as trusted, if that’s a desirable concept).Push all required conversion to go between strings/objects to the orchestrator that requires such conversions and only when such conversions are actually needed. For example, if the output of function A is an
IAsyncEnumerable<string>
, the input of function B is anIAsyncEnumerable<string>
, and the plan dictates that function A’s output be piped to function B, the orchestrator can just pass theIAsyncEnumerable<string>
directly from one to the other (viaSKContext
if desirable, but that’s left up to the orchestrator); it needn’t do any translation. Or if the output of function A is anInt32
, the input to function B is astring
, and the plan dictates that the output of function A be piped to function B, then the orchestrator can then do a conversion fromInt32
tostring
, using whatever mechanism is agreed upon for performing these translation (https://github.com/microsoft/semantic-kernel/pull/1195 uses TypeConverter, which is the same as what’s used by ASP.NET, for example, for converting between textual query string / form / cookie values and function inputs / outputs… similarly for conversions performed in EF, MAUI, etc.) The orchestrator itself can be parameterized with these conversions, e.g. a planner could be given aFunc<IAsyncEnumerable<string>, Task<string>>
that teaches it how to translate anIAsyncEnumerable<string>
into astring
, at which point it would support flowing the output of something that returned anIAsyncEnumerable<string>
to something accepting astring
, but without that would still be able to propagate the object when no conversion was required. The notion of what can be connected to what can be fed into the plan, based on all of the types and what’s compatible with what.This removes artificial barriers. Functions are no longer forced to know anything about
SKContext
, and values maintain full fidelity in both content and type until it’s required by the consumer that a translation happens, at which point it’s up to that consumer to perform the translation (the consumer here being a direct invoker, the orchestor, etc.)