Add Request Context feature: allow to override settings at runtime

Motivation and Context (Why the change? What's the scenario?)

Several settings of the solution are hard coded by design, and others can be configured but require a service restart to be changed.

This PR introduces a Context object that is passed during ingestion and search/ask. The context object is optional and can contain custom key-values accessible to handlers and search clients.

During web requests, the context is accessible also via dependency injection in those scenarios where a method signature doesn't support IContext, see RequestContextProvider. During ingestion, the context is accessible through the DataPipeline instance.

The context allows to override the following settings during a request/during the upload a document, without the need to change the code or change the configuration:

custom_partitioning_max_tokens_per_paragraph_int: the max size of paragraphs while partitioning a file during the upload
custom_partitioning_overlapping_tokens_int: overlapping tokens while partitioning a file
custom_rag_empty_answer_str: the answer returned by Ask when no answer can be found
custom_rag_prompt_str and custom_rag_fact_template_str: prompt used for RAG, including how facts are injected
custom_rag_max_tokens_int: max number of tokens to generate with the RAG prompt
custom_rag_temperature_float: temperature used with the RAG prompt
custom_rag_nucleus_sampling_float: nucleus sampling used with the RAG prompt
custom_summary_prompt_str: prompt used to summarize content
custom_summary_target_token_size_int: size of the summary to generate (best effort)
custom_summary_overlapping_tokens_int: overlapping tokens while generating summaries

Other keys can be used e.g. when working with custom handlers and custom classes. The RAG {{$facts}} template is now configurable and can include tags and metadata:

{{$content}}: text from memory, i.e. chunk of text extracted from the source
{{$source}}: name of the source file, or URL of the web page, where the content originated.
{{$relevance}}: relevance score of the current chunk of text
{{$memoryId}}: ID of the memory record
{{$tags}}: list of tags, excluding reserved/internal ones
{{$tag[X]}}: tag X value(s), replaced with "-" if the value is empty
{{$meta[X]}}: value of memory record payload X field (memory payload is also known as metadata), replaced with "-" if the value is empty

The PR includes a few examples.

Sample syntax:

var context = new RequestContext();

context.SetArg("custom_summary_prompt_str", "Summarize this: {{$input}}. Summary: ");

context.SetArg("custom_summary_overlapping_tokens_int", 0);

await memory.ImportDocumentAsync(
    new Document("doc1").AddFile("file4-KM-Readme.pdf"),
    steps: Constants.PipelineOnlySummary,
    context: context);

var context = new RequestContext();

context.SetArg("custom_rag_fact_template_str", "=== Last update: {{$meta[last_update]}} ===\n{{$content}}\n");

context.SetArg("custom_rag_prompt_str", """
                                        Facts:
                                        {{$facts}}
                                        ======
                                        Given only the timestamped facts above, provide a very short answer, include the relevant dates in brackets.
                                        If you don't have sufficient information, reply with '{{$notFound}}'.
                                        Question: {{$input}}
                                        Answer:
                                        """);

var answer = await s_memory.AskAsync("What's Kernel Memory?", context: context);

microsoft / kernel-memory

Add Request Context feature: allow to override settings at runtime #673

Motivation and Context (Why the change? What's the scenario?)