Open Barshan-Mandal opened 1 month ago
Hi @Barshan-Mandal
To implement Retrieval Augmented Generation (RAG) using the Phi-3 model with kernel memory instead of semantic text memory, you can follow a similar approach to the one used with semantic text memory but adapt it for offline retrieval. Here's a basic example in C# to get you started:
Set Up Your Environment:
Microsoft.SemanticKernel
and other dependencies.Create the Kernel and Memory:
Implement the Retrieval and Generation Logic:
Here's a simplified code snippet to illustrate this:
LocalMemoryService
class is a simple implementation that retrieves information based on a key. You can expand this to use a local database or file system.CompleteChat
method combines the question with the retrieved information to generate a response.using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Memory;
class Program
{
static void Main(string[] args)
{
// Initialize the kernel
var kernel = Kernel.CreateBuilder()
.AddChatCompletionService(new LocalModelChatCompletionService("path_to_phi3_model"))
.AddMemoryService(new LocalMemoryService())
.Build();
// Define your question
string question = "What is the capital of France?";
// Retrieve relevant information from local memory
var memoryService = kernel.GetService<IMemoryService>();
var relevantInfo = memoryService.Retrieve("capital_of_france");
// Generate a response using the Phi-3 model
var chatCompletionService = kernel.GetService<IChatCompletionService>();
var response = chatCompletionService.CompleteChat(question, relevantInfo);
// Output the response
Console.WriteLine(response);
}
}
class LocalMemoryService : IMemoryService
{
// Implement your local memory retrieval logic here
public string Retrieve(string key)
{
// Example: Retrieve data from a local file or database
if (key == "capital_of_france")
{
return "The capital of France is Paris.";
}
return string.Empty;
}
}
This is a basic example to get you started.
Please see this blog for a More in depth walkthrough
why isnt this working
`
using Build5Nines.SharpVector; using Build5Nines.SharpVector.Data; using Elastic.Clients.Elasticsearch.Core.Search; using Microsoft.KernelMemory; using Microsoft.ML.OnnxRuntimeGenAI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.ChatCompletion; using Microsoft.SemanticKernel.Connectors.OpenAI; using Microsoft.SemanticKernel.Embeddings; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Plugins.Memory; using System.Collections; private static async Task Phi3MemoryRAG() { var modelPath = @"F:\HuggingFace Models\Phi-3-Vision\cpu-int4-rtn-block-32-acc-level-4"; string modelId = "microsoft/Phi-3-vision-128k-instruct"; // Load the model and services var builder = Kernel.CreateBuilder(); builder.AddOnnxRuntimeGenAIChatCompletion(modelId, modelPath); builder.AddLocalTextEmbeddingGeneration();
// Build Kernel
var kernel = builder.Build();
// Create services such as chatCompletionService and embeddingGeneration
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
var embeddingGenerator = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
// Setup a memory store and create a memory out of it
var memoryStore = new VolatileMemoryStore();
var memory = new SemanticTextMemory(memoryStore, embeddingGenerator);
const string MemoryCollectionName = "fanFacts";
await memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "Gisela's favourite super hero is Batman");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "The last super hero movie watched by Gisela was Guardians of the Galaxy Vol 3");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "Bruno's favourite super hero is Invincible");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "The last super hero movie watched by Bruno was Aquaman II");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "Bruno don't like the super hero movie: Eternals");
// Loading it for Save, Recall and other methods
kernel.ImportPluginFromObject(new TextMemoryPlugin(memory));
while (true)
{
// Get user input
Console.ForegroundColor = ConsoleColor.White;
Console.Write("User > ");
var question = Console.ReadLine()!;
// Settings for the Phi-3 execution
OpenAIPromptExecutionSettings executionSettings = new()
{
ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions,
MaxTokens = 200
};
// Invoke the kernel with the user input
var response = kernel.InvokePromptStreamingAsync(
promptTemplate: @"Question: {{$input}}
Answer the question using the memory content: {{Recall}}", arguments: new KernelArguments(executionSettings) { { "input", question }, { "collection", MemoryCollectionName } } );
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("\nAssistant > ");
string combinedResponse = string.Empty;
await foreach (var message in response)
{
//Write the response to the console
Console.Write(message);
combinedResponse += message;
}
Console.WriteLine();
}
}`
It looks like you're trying to set up a memory retrieval system using the Semantic Kernel and ONNX Runtime for a chat application.
Here are a few things to check and some suggestions to help troubleshoot the issue:
Dependencies and Imports:
Model Path and ID:
modelPath
and modelId
to ensure they are correct and accessible.Kernel and Services Initialization:
Memory Store and Plugin:
VolatileMemoryStore
and TextMemoryPlugin
are correctly set up and integrated with the kernel.Prompt Execution Settings:
OpenAIPromptExecutionSettings
and ensure they are correctly configured for your use case.Async Method Execution:
Here's a slightly revised version of your code with some minor adjustments if you want to try this:
#pragma warning disable SKEXP0070
#pragma warning disable SKEXP0050
#pragma warning disable SKEXP0001
#pragma warning disable SKEXP0052
#pragma warning disable NU1605
using Build5Nines.SharpVector;
using Build5Nines.SharpVector.Data;
using Elastic.Clients.Elasticsearch.Core.Search;
using Microsoft.KernelMemory;
using Microsoft.ML.OnnxRuntimeGenAI;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Plugins.Memory;
using System.Collections;
using System.Threading.Tasks;
private static async Task Phi3MemoryRAG()
{
var modelPath = @"F:\HuggingFace Models\Phi-3-Vision\cpu-int4-rtn-block-32-acc-level-4";
string modelId = "microsoft/Phi-3-vision-128k-instruct";
// Load the model and services
var builder = Kernel.CreateBuilder();
builder.AddOnnxRuntimeGenAIChatCompletion(modelId, modelPath);
builder.AddLocalTextEmbeddingGeneration();
// Build Kernel
var kernel = builder.Build();
// Create services such as chatCompletionService and embeddingGeneration
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
var embeddingGenerator = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
// Setup a memory store and create a memory out of it
var memoryStore = new VolatileMemoryStore();
var memory = new SemanticTextMemory(memoryStore, embeddingGenerator);
const string MemoryCollectionName = "fanFacts";
await memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "Gisela's favourite super hero is Batman");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "The last super hero movie watched by Gisela was Guardians of the Galaxy Vol 3");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "Bruno's favourite super hero is Invincible");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "The last super hero movie watched by Bruno was Aquaman II");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "Bruno doesn't like the super hero movie: Eternals");
// Loading it for Save, Recall and other methods
kernel.ImportPluginFromObject(new TextMemoryPlugin(memory));
while (true)
{
// Get user input
Console.ForegroundColor = ConsoleColor.White;
Console.Write("User > ");
var question = Console.ReadLine()!;
// Settings for the Phi-3 execution
OpenAIPromptExecutionSettings executionSettings = new()
{
ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions,
MaxTokens = 200
};
// Invoke the kernel with the user input
var response = kernel.InvokePromptStreamingAsync(
promptTemplate: @"Question: {{$input}}
Answer the question using the memory content: {{Recall}}",
arguments: new KernelArguments(executionSettings)
{
{ "input", question },
{ "collection", MemoryCollectionName }
}
);
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("\nAssistant > ");
string combinedResponse = string.Empty;
await foreach (var message in response)
{
// Write the response to the console
Console.Write(message);
combinedResponse += message;
}
Console.WriteLine();
}
}
It looks like you're trying to set up a memory retrieval system using the Semantic Kernel and ONNX Runtime for a chat application.
Here are a few things to check and some suggestions to help troubleshoot the issue:
Dependencies and Imports:
- Ensure all the necessary NuGet packages are installed and up-to-date.
- Verify that the namespaces and classes you're using are correctly referenced.
Model Path and ID:
- Double-check the
modelPath
andmodelId
to ensure they are correct and accessible.Kernel and Services Initialization:
- Make sure the kernel and services are properly initialized. Check for any exceptions or errors during the initialization process.
Memory Store and Plugin:
- Ensure the
VolatileMemoryStore
andTextMemoryPlugin
are correctly set up and integrated with the kernel.Prompt Execution Settings:
- Verify the
OpenAIPromptExecutionSettings
and ensure they are correctly configured for your use case.Async Method Execution:
- Ensure that all asynchronous methods are awaited properly to avoid any runtime issues.
Here's a slightly revised version of your code with some minor adjustments if you want to try this:
#pragma warning disable SKEXP0070 #pragma warning disable SKEXP0050 #pragma warning disable SKEXP0001 #pragma warning disable SKEXP0052 #pragma warning disable NU1605 using Build5Nines.SharpVector; using Build5Nines.SharpVector.Data; using Elastic.Clients.Elasticsearch.Core.Search; using Microsoft.KernelMemory; using Microsoft.ML.OnnxRuntimeGenAI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.ChatCompletion; using Microsoft.SemanticKernel.Connectors.OpenAI; using Microsoft.SemanticKernel.Embeddings; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Plugins.Memory; using System.Collections; using System.Threading.Tasks; private static async Task Phi3MemoryRAG() { var modelPath = @"F:\HuggingFace Models\Phi-3-Vision\cpu-int4-rtn-block-32-acc-level-4"; string modelId = "microsoft/Phi-3-vision-128k-instruct"; // Load the model and services var builder = Kernel.CreateBuilder(); builder.AddOnnxRuntimeGenAIChatCompletion(modelId, modelPath); builder.AddLocalTextEmbeddingGeneration(); // Build Kernel var kernel = builder.Build(); // Create services such as chatCompletionService and embeddingGeneration var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>(); var embeddingGenerator = kernel.GetRequiredService<ITextEmbeddingGenerationService>(); // Setup a memory store and create a memory out of it var memoryStore = new VolatileMemoryStore(); var memory = new SemanticTextMemory(memoryStore, embeddingGenerator); const string MemoryCollectionName = "fanFacts"; await memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "Gisela's favourite super hero is Batman"); await memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "The last super hero movie watched by Gisela was Guardians of the Galaxy Vol 3"); await memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "Bruno's favourite super hero is Invincible"); await memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "The last super hero movie watched by Bruno was Aquaman II"); await memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "Bruno doesn't like the super hero movie: Eternals"); // Loading it for Save, Recall and other methods kernel.ImportPluginFromObject(new TextMemoryPlugin(memory)); while (true) { // Get user input Console.ForegroundColor = ConsoleColor.White; Console.Write("User > "); var question = Console.ReadLine()!; // Settings for the Phi-3 execution OpenAIPromptExecutionSettings executionSettings = new() { ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions, MaxTokens = 200 }; // Invoke the kernel with the user input var response = kernel.InvokePromptStreamingAsync( promptTemplate: @"Question: {{$input}} Answer the question using the memory content: {{Recall}}", arguments: new KernelArguments(executionSettings) { { "input", question }, { "collection", MemoryCollectionName } } ); Console.ForegroundColor = ConsoleColor.Green; Console.Write("\nAssistant > "); string combinedResponse = string.Empty; await foreach (var message in response) { // Write the response to the console Console.Write(message); combinedResponse += message; } Console.WriteLine(); } }
System.MissingMethodException HResult=0x80131513 Message=Method not found: 'System.ValueTuple
3<System.ReadOnlyMemory1<Int64>,System.ReadOnlyMemory
11<Int64>> FastBertTokenizer.BertTokenizer.Encode(System.String, Int32, System.Nullable
11 outputBuffer, Int32 maximumTokens) at SmartComponents.LocalEmbeddings.LocalEmbedder.Embed(String inputText, Int32 maximumTokens) at SmartComponents.LocalEmbeddings.SemanticKernel.LocalTextEmbeddingGenerationService.GenerateEmbeddingsAsync(IList
1 data, Kernel kernel, CancellationToken cancellationToken)
at Microsoft.SemanticKernel.Embeddings.EmbeddingGenerationExtensions.
How do i use RAG by kernel memory and Semantic kernel Handlebar Planner with phi3 vision
the above exception is thrown
RAG for phi 3 vision using kernel memory in place of semantic text memory.Any example for offline retrival in c#?
sorry to bother you.I am new is this arena