Motivation and Context (Why the change? What's the scenario?)
Several settings of the solution are hard coded by design, and others can be configured but require a service restart to be changed.
This PR introduces a Context object that is passed during ingestion and search/ask. The context object is optional and can contain custom key-values accessible to handlers and search clients.
During web requests, the context is accessible also via dependency injection in those scenarios where a method signature doesn't support IContext, see RequestContextProvider. During ingestion, the context is accessible through the DataPipeline instance.
The context allows to override the following settings during a request/during the upload a document, without the need to change the code or change the configuration:
custom_partitioning_max_tokens_per_paragraph_int: the max size of paragraphs while partitioning a file during the upload
custom_partitioning_overlapping_tokens_int: overlapping tokens while partitioning a file
custom_rag_empty_answer_str: the answer returned by Ask when no answer can be found
custom_rag_prompt_str and custom_rag_fact_template_str: prompt used for RAG, including how facts are injected
custom_rag_max_tokens_int: max number of tokens to generate with the RAG prompt
custom_rag_temperature_float: temperature used with the RAG prompt
custom_rag_nucleus_sampling_float: nucleus sampling used with the RAG prompt
custom_summary_prompt_str: prompt used to summarize content
custom_summary_target_token_size_int: size of the summary to generate (best effort)
custom_summary_overlapping_tokens_int: overlapping tokens while generating summaries
Other keys can be used e.g. when working with custom handlers and custom classes.
The RAG {{$facts}} template is now configurable and can include tags and metadata:
{{$content}}: text from memory, i.e. chunk of text extracted from the source
{{$source}}: name of the source file, or URL of the web page, where the content originated.
{{$relevance}}: relevance score of the current chunk of text
{{$memoryId}}: ID of the memory record
{{$tags}}: list of tags, excluding reserved/internal ones
{{$tag[X]}}: tag X value(s), replaced with "-" if the value is empty
{{$meta[X]}}: value of memory record payload X field (memory payload is also known as metadata), replaced with "-" if the value is empty
The PR includes a few examples.
Sample syntax:
var context = new RequestContext();
context.SetArg("custom_summary_prompt_str", "Summarize this: {{$input}}. Summary: ");
context.SetArg("custom_summary_overlapping_tokens_int", 0);
await memory.ImportDocumentAsync(
new Document("doc1").AddFile("file4-KM-Readme.pdf"),
steps: Constants.PipelineOnlySummary,
context: context);
var context = new RequestContext();
context.SetArg("custom_rag_fact_template_str", "=== Last update: {{$meta[last_update]}} ===\n{{$content}}\n");
context.SetArg("custom_rag_prompt_str", """
Facts:
{{$facts}}
======
Given only the timestamped facts above, provide a very short answer, include the relevant dates in brackets.
If you don't have sufficient information, reply with '{{$notFound}}'.
Question: {{$input}}
Answer:
""");
var answer = await s_memory.AskAsync("What's Kernel Memory?", context: context);
Motivation and Context (Why the change? What's the scenario?)
Several settings of the solution are hard coded by design, and others can be configured but require a service restart to be changed.
This PR introduces a Context object that is passed during ingestion and search/ask. The context object is optional and can contain custom key-values accessible to handlers and search clients.
During web requests, the context is accessible also via dependency injection in those scenarios where a method signature doesn't support IContext, see
RequestContextProvider
. During ingestion, the context is accessible through theDataPipeline
instance.The context allows to override the following settings during a request/during the upload a document, without the need to change the code or change the configuration:
custom_partitioning_max_tokens_per_paragraph_int
: the max size of paragraphs while partitioning a file during the uploadcustom_partitioning_overlapping_tokens_int
: overlapping tokens while partitioning a filecustom_rag_empty_answer_str
: the answer returned by Ask when no answer can be foundcustom_rag_prompt_str
andcustom_rag_fact_template_str
: prompt used for RAG, including how facts are injectedcustom_rag_max_tokens_int
: max number of tokens to generate with the RAG promptcustom_rag_temperature_float
: temperature used with the RAG promptcustom_rag_nucleus_sampling_float
: nucleus sampling used with the RAG promptcustom_summary_prompt_str
: prompt used to summarize contentcustom_summary_target_token_size_int
: size of the summary to generate (best effort)custom_summary_overlapping_tokens_int
: overlapping tokens while generating summariesOther keys can be used e.g. when working with custom handlers and custom classes. The RAG
{{$facts}}
template is now configurable and can include tags and metadata:{{$content}}
: text from memory, i.e. chunk of text extracted from the source{{$source}}
: name of the source file, or URL of the web page, where the content originated.{{$relevance}}
: relevance score of the current chunk of text{{$memoryId}}
: ID of the memory record{{$tags}}
: list of tags, excluding reserved/internal ones{{$tag[X]}}
: tag X value(s), replaced with "-" if the value is empty{{$meta[X]}}
: value of memory record payload X field (memory payload is also known as metadata), replaced with "-" if the value is emptyThe PR includes a few examples.
Sample syntax: