Closed matthewbolanos closed 2 months ago
For Approving function calls:
Not sure I completely understand the use case for User vs Developer. In terms of what happens after a 'cancellation' there should be higher flexibility. There are two places for filters Pre function invocation and post invocation. If the user decides to cancel, it should also bypass (or at least have the option) to bypass any additional call to the LLM and return the relevant data back to the user:
@matthewbolanos , does that make sense ?
I created a second issue to track the need for filters at the automatic function call level here: https://github.com/microsoft/semantic-kernel/issues/5470
In terms of prioritization of samples, I would demonstrate filters in the following order:
3. Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require Text classification ADR #5279
@matthewbolanos I've already added text moderation together with Prompt Shields in the same PR. But it uses Azure AI Content Safety service instead of OpenAI moderation endpoint: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Demos/ContentSafety/Filters/TextModerationFilter.cs
We can extend this demo app later when OpenAI text moderation connector will be implemented. Let me know what you think about it.
Lowered the priority of the text classification example per your comment
Hi @dmytrostruk, there are 3 top items I have from customer requests to add to this:
- Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).
2. Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.
Example of PII detection: https://github.com/microsoft/semantic-kernel/pull/6171
Example of text summarization and translation evaluation: https://github.com/microsoft/semantic-kernel/pull/6262
Example of FrugalGPT: https://github.com/microsoft/semantic-kernel/pull/6815
Tasks
To make filters non-experimental, the following user stories should be met:
Additionally, thought should be considered to make applying filters more easily. These are likely extensions, so they should not be blocking for making filters non-experimental: