microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.32k stars 3.14k forks source link

.Net: Filters use cases to be supported before making the feature non-experimental #5436

Closed matthewbolanos closed 2 months ago

matthewbolanos commented 6 months ago

Tasks

To make filters non-experimental, the following user stories should be met:

Additionally, thought should be considered to make applying filters more easily. These are likely extensions, so they should not be blocking for making filters non-experimental:

lavinir commented 6 months ago

For Approving function calls:

Not sure I completely understand the use case for User vs Developer. In terms of what happens after a 'cancellation' there should be higher flexibility. There are two places for filters Pre function invocation and post invocation. If the user decides to cancel, it should also bypass (or at least have the option) to bypass any additional call to the LLM and return the relevant data back to the user:

  1. Updated chat history including any successful function invocations that were not cancelled.
  2. Function response (if this is a post invocation filter)
  3. Perhaps some additional context object supplied by the function filter.

@matthewbolanos , does that make sense ?

matthewbolanos commented 5 months ago

I created a second issue to track the need for filters at the automatic function call level here: https://github.com/microsoft/semantic-kernel/issues/5470

matthewbolanos commented 4 months ago

In terms of prioritization of samples, I would demonstrate filters in the following order:

  1. Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).
  2. Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.
  3. Frugal GPT – The developer should be able to make a request to a cheaper model (e.g., GPT-3.5 turbo) to determine how complex a query is. If the model thinks the request is complex, GPT-4 should be used; otherwise, GPT-3.5 should be used.
  4. Long running memory – After a chat completion has been completed, the developer should be able to cache the previous back-and-forth. Later, during a future prompt rendering, the developer should be able to use RAG to retrieve previous conversations that are relevant to the most recent query/topic and inject them into the chat history.
  5. Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require #5279
dmytrostruk commented 4 months ago

3. Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require Text classification ADR #5279

@matthewbolanos I've already added text moderation together with Prompt Shields in the same PR. But it uses Azure AI Content Safety service instead of OpenAI moderation endpoint: https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Demos/ContentSafety/Filters/TextModerationFilter.cs image

We can extend this demo app later when OpenAI text moderation connector will be implemented. Let me know what you think about it.

matthewbolanos commented 4 months ago

Lowered the priority of the text classification example per your comment

sophialagerkranspandey commented 4 months ago

Hi @dmytrostruk, there are 3 top items I have from customer requests to add to this:

  1. https://hub.guardrailsai.com/validator/aryn/extractive_summary top request
  2. https://hub.guardrailsai.com/validator/brainlogic/high_quality_translation
  3. https://hub.guardrailsai.com/validator/guardrails/detect_pii
dmytrostruk commented 4 months ago
  1. Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).

https://github.com/microsoft/semantic-kernel/pull/6109

dmytrostruk commented 4 months ago

2. Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.

https://github.com/microsoft/semantic-kernel/pull/6151

dmytrostruk commented 4 months ago

Example of PII detection: https://github.com/microsoft/semantic-kernel/pull/6171

dmytrostruk commented 4 months ago

Example of text summarization and translation evaluation: https://github.com/microsoft/semantic-kernel/pull/6262

dmytrostruk commented 2 months ago

Example of FrugalGPT: https://github.com/microsoft/semantic-kernel/pull/6815