.Net: Filters use cases to be supported before making the feature non-experimental

matthewbolanos commented 6 months ago

Tasks

[x] Create showcase application which demonstrates all functionality below

To make filters non-experimental, the following user stories should be met:

[x] Telemetry – any of the telemetry that is made available via Semantic Kernel logging should be possible to recreate using filters. This would allow a developer to author telemetry with additional coordination IDs and signals that are not available in the out-of-the-box telemetry. To validate this, we should ensure that filters are available in the same spots as existing telemetry and the filters have the same information.
[x] Approving function calls – a developer should be able to request approval from a user before performing an action requested by an AI. If the user approves, the function should be invoked. If the user disapproves, the developer should be able to customize the cancelation message sent to the AI so that it understands that the action was rejected. For example, it should be possible to return back a message like "The function call to wire $1000 was rejected by the current user".
[x] Semantic Caching – with filters, a developer should be able to get a rendered prompt and check if there is a cached result that can be given to a user instead of spending tokens with an LLM. To achieve this, a developer should be able to add a filter after a prompt is rendered to check if there is a cached result. If there is, the dev should be able to cancel the operation and send back the cached result instead. It should also be possible to cache results from an LLM request using the function invoked filter.
[x] FrugalGPT – a dev should be able to implement FrugalGPT with filters. With filters, a developer should be able to call a local AI model to determine how complex a request is. Depending on the complexity of the request, the dev should be able to change the execution settings of the prompt so that it can use different models. This might be possible with the AI service selector. If it is, we should rationalize which path should be used when.
[x] Catching/throwing errors – a function may result in an error. Instead of sending the default error to the LLM, the developer should be able to do one of three things... 1) customize the error message sent to the LLM, 2) let the error propagate up so that it can be caught elsewhere, 3) do nothing and let the existing error message go to the LLM.

Additionally, thought should be considered to make applying filters more easily. These are likely extensions, so they should not be blocking for making filters non-experimental:

Targeting filter – It should be "easy" to apply a filter to a single function or an entire plugin without having to write conditions within the filter. For example, semantic caching may only be necessary (or valid) on a couple semantic functions. The developer should be able to take an off-the-shelf filter (e.g., written by the Redis Cache team) and selectively apply it to only some prompts.
Targeting filters by property – Some filters should only be enabled if a function has a particular property (e.g., if they are a semantic function). One property we may also consider is if a function is "consequential". With this property, a developer could choose which functions should require approvals.
Targeting by invocation – Some filters should only be enabled if the function is invoked via a tool call. For example, it's likely not necessary to request user approval if a function is called within a template or explicitly by the developer. Instead, it's only necessary to run the filter if its invoked by an AI (where less trust is available).

lavinir commented 6 months ago

For Approving function calls:

Not sure I completely understand the use case for User vs Developer. In terms of what happens after a 'cancellation' there should be higher flexibility. There are two places for filters Pre function invocation and post invocation. If the user decides to cancel, it should also bypass (or at least have the option) to bypass any additional call to the LLM and return the relevant data back to the user:

Updated chat history including any successful function invocations that were not cancelled.
Function response (if this is a post invocation filter)
Perhaps some additional context object supplied by the function filter.

@matthewbolanos , does that make sense ?

matthewbolanos commented 5 months ago

I created a second issue to track the need for filters at the automatic function call level here: https://github.com/microsoft/semantic-kernel/issues/5470

matthewbolanos commented 4 months ago

In terms of prioritization of samples, I would demonstrate filters in the following order:

Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).
Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.
Frugal GPT – The developer should be able to make a request to a cheaper model (e.g., GPT-3.5 turbo) to determine how complex a query is. If the model thinks the request is complex, GPT-4 should be used; otherwise, GPT-3.5 should be used.
Long running memory – After a chat completion has been completed, the developer should be able to cache the previous back-and-forth. Later, during a future prompt rendering, the developer should be able to use RAG to retrieve previous conversations that are relevant to the most recent query/topic and inject them into the chat history.
Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require #5279

dmytrostruk commented 4 months ago

3. Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require Text classification ADR #5279

@matthewbolanos I've already added text moderation together with Prompt Shields in the same PR. But it uses Azure AI Content Safety service instead of OpenAI moderation endpoint: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Demos/ContentSafety/Filters/TextModerationFilter.cs

We can extend this demo app later when OpenAI text moderation connector will be implemented. Let me know what you think about it.

matthewbolanos commented 4 months ago

Lowered the priority of the text classification example per your comment

sophialagerkranspandey commented 4 months ago

Hi @dmytrostruk, there are 3 top items I have from customer requests to add to this:

dmytrostruk commented 4 months ago

Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).

https://github.com/microsoft/semantic-kernel/pull/6109

dmytrostruk commented 4 months ago

2. Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.

https://github.com/microsoft/semantic-kernel/pull/6151

dmytrostruk commented 4 months ago

Example of PII detection: https://github.com/microsoft/semantic-kernel/pull/6171

dmytrostruk commented 4 months ago

Example of text summarization and translation evaluation: https://github.com/microsoft/semantic-kernel/pull/6262

dmytrostruk commented 2 months ago

Example of FrugalGPT: https://github.com/microsoft/semantic-kernel/pull/6815

microsoft / semantic-kernel

.Net: Filters use cases to be supported before making the feature non-experimental #5436