microsoft / Phi-3CookBook

This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
MIT License
2.22k stars 220 forks source link

RAG for phi 3 vision using kernel memory #138

Open Barshan-Mandal opened 1 month ago

Barshan-Mandal commented 1 month ago

RAG for phi 3 vision using kernel memory in place of semantic text memory.Any example for offline retrival in c#?

sorry to bother you.I am new is this arena

leestott commented 1 month ago

Hi @Barshan-Mandal

To implement Retrieval Augmented Generation (RAG) using the Phi-3 model with kernel memory instead of semantic text memory, you can follow a similar approach to the one used with semantic text memory but adapt it for offline retrieval. Here's a basic example in C# to get you started:

Step-by-Step Example

  1. Set Up Your Environment:

    • Ensure you have the necessary libraries installed. You might need Microsoft.SemanticKernel and other dependencies.
  2. Create the Kernel and Memory:

    • Initialize the kernel and set up the memory for offline retrieval.
  3. Implement the Retrieval and Generation Logic:

    • Use the kernel to retrieve relevant information from your local memory and generate responses using the Phi-3 model.

Here's a simplified code snippet to illustrate this:

Explanation of the code:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Memory;

class Program
{
    static void Main(string[] args)
    {
        // Initialize the kernel
        var kernel = Kernel.CreateBuilder()
                           .AddChatCompletionService(new LocalModelChatCompletionService("path_to_phi3_model"))
                           .AddMemoryService(new LocalMemoryService())
                           .Build();

        // Define your question
        string question = "What is the capital of France?";

        // Retrieve relevant information from local memory
        var memoryService = kernel.GetService<IMemoryService>();
        var relevantInfo = memoryService.Retrieve("capital_of_france");

        // Generate a response using the Phi-3 model
        var chatCompletionService = kernel.GetService<IChatCompletionService>();
        var response = chatCompletionService.CompleteChat(question, relevantInfo);

        // Output the response
        Console.WriteLine(response);
    }
}

class LocalMemoryService : IMemoryService
{
    // Implement your local memory retrieval logic here
    public string Retrieve(string key)
    {
        // Example: Retrieve data from a local file or database
        if (key == "capital_of_france")
        {
            return "The capital of France is Paris.";
        }
        return string.Empty;
    }
}

This is a basic example to get you started.

Please see this blog for a More in depth walkthrough

Barshan-Mandal commented 4 weeks ago

why isnt this working

`

pragma warning disable SKEXP0070

pragma warning disable SKEXP0050

pragma warning disable SKEXP0001

pragma warning disable SKEXP0052

pragma warning disable NU1605

using Build5Nines.SharpVector; using Build5Nines.SharpVector.Data; using Elastic.Clients.Elasticsearch.Core.Search; using Microsoft.KernelMemory; using Microsoft.ML.OnnxRuntimeGenAI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.ChatCompletion; using Microsoft.SemanticKernel.Connectors.OpenAI; using Microsoft.SemanticKernel.Embeddings; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Plugins.Memory; using System.Collections; private static async Task Phi3MemoryRAG() { var modelPath = @"F:\HuggingFace Models\Phi-3-Vision\cpu-int4-rtn-block-32-acc-level-4"; string modelId = "microsoft/Phi-3-vision-128k-instruct"; // Load the model and services var builder = Kernel.CreateBuilder(); builder.AddOnnxRuntimeGenAIChatCompletion(modelId, modelPath); builder.AddLocalTextEmbeddingGeneration();

// Build Kernel
var kernel = builder.Build();

// Create services such as chatCompletionService and embeddingGeneration
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
var embeddingGenerator = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

// Setup a memory store and create a memory out of it
var memoryStore = new VolatileMemoryStore();
var memory = new SemanticTextMemory(memoryStore, embeddingGenerator);

const string MemoryCollectionName = "fanFacts";

await memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "Gisela's favourite super hero is Batman");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "The last super hero movie watched by Gisela was Guardians of the Galaxy Vol 3");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "Bruno's favourite super hero is Invincible");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "The last super hero movie watched by Bruno was Aquaman II");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "Bruno don't like the super hero movie: Eternals");

// Loading it for Save, Recall and other methods
kernel.ImportPluginFromObject(new TextMemoryPlugin(memory));

while (true)
{
    // Get user input
    Console.ForegroundColor = ConsoleColor.White;
    Console.Write("User > ");
    var question = Console.ReadLine()!;

    // Settings for the Phi-3 execution
    OpenAIPromptExecutionSettings executionSettings = new()
    {
        ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions,
        MaxTokens = 200
    };

    // Invoke the kernel with the user input
    var response = kernel.InvokePromptStreamingAsync(
        promptTemplate: @"Question: {{$input}}

Answer the question using the memory content: {{Recall}}", arguments: new KernelArguments(executionSettings) { { "input", question }, { "collection", MemoryCollectionName } } );

    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("\nAssistant > ");

    string combinedResponse = string.Empty;
    await foreach (var message in response)
    {
        //Write the response to the console
        Console.Write(message);
        combinedResponse += message;
    }

    Console.WriteLine();
}

}`

leestott commented 4 weeks ago

It looks like you're trying to set up a memory retrieval system using the Semantic Kernel and ONNX Runtime for a chat application.

Here are a few things to check and some suggestions to help troubleshoot the issue:

  1. Dependencies and Imports:

    • Ensure all the necessary NuGet packages are installed and up-to-date.
    • Verify that the namespaces and classes you're using are correctly referenced.
  2. Model Path and ID:

    • Double-check the modelPath and modelId to ensure they are correct and accessible.
  3. Kernel and Services Initialization:

    • Make sure the kernel and services are properly initialized. Check for any exceptions or errors during the initialization process.
  4. Memory Store and Plugin:

    • Ensure the VolatileMemoryStore and TextMemoryPlugin are correctly set up and integrated with the kernel.
  5. Prompt Execution Settings:

    • Verify the OpenAIPromptExecutionSettings and ensure they are correctly configured for your use case.
  6. Async Method Execution:

    • Ensure that all asynchronous methods are awaited properly to avoid any runtime issues.

Here's a slightly revised version of your code with some minor adjustments if you want to try this:

#pragma warning disable SKEXP0070
#pragma warning disable SKEXP0050
#pragma warning disable SKEXP0001
#pragma warning disable SKEXP0052
#pragma warning disable NU1605

using Build5Nines.SharpVector;
using Build5Nines.SharpVector.Data;
using Elastic.Clients.Elasticsearch.Core.Search;
using Microsoft.KernelMemory;
using Microsoft.ML.OnnxRuntimeGenAI;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Plugins.Memory;
using System.Collections;
using System.Threading.Tasks;

private static async Task Phi3MemoryRAG()
{
    var modelPath = @"F:\HuggingFace Models\Phi-3-Vision\cpu-int4-rtn-block-32-acc-level-4";
    string modelId = "microsoft/Phi-3-vision-128k-instruct";

    // Load the model and services
    var builder = Kernel.CreateBuilder();
    builder.AddOnnxRuntimeGenAIChatCompletion(modelId, modelPath);
    builder.AddLocalTextEmbeddingGeneration();

    // Build Kernel
    var kernel = builder.Build();

    // Create services such as chatCompletionService and embeddingGeneration
    var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
    var embeddingGenerator = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

    // Setup a memory store and create a memory out of it
    var memoryStore = new VolatileMemoryStore();
    var memory = new SemanticTextMemory(memoryStore, embeddingGenerator);

    const string MemoryCollectionName = "fanFacts";

    await memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "Gisela's favourite super hero is Batman");
    await memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "The last super hero movie watched by Gisela was Guardians of the Galaxy Vol 3");
    await memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "Bruno's favourite super hero is Invincible");
    await memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "The last super hero movie watched by Bruno was Aquaman II");
    await memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "Bruno doesn't like the super hero movie: Eternals");

    // Loading it for Save, Recall and other methods
    kernel.ImportPluginFromObject(new TextMemoryPlugin(memory));

    while (true)
    {
        // Get user input
        Console.ForegroundColor = ConsoleColor.White;
        Console.Write("User > ");
        var question = Console.ReadLine()!;

        // Settings for the Phi-3 execution
        OpenAIPromptExecutionSettings executionSettings = new()
        {
            ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions,
            MaxTokens = 200
        };

        // Invoke the kernel with the user input
        var response = kernel.InvokePromptStreamingAsync(
            promptTemplate: @"Question: {{$input}}
Answer the question using the memory content: {{Recall}}",
            arguments: new KernelArguments(executionSettings)
            {
                { "input", question },
                { "collection", MemoryCollectionName }
            }
        );

        Console.ForegroundColor = ConsoleColor.Green;
        Console.Write("\nAssistant > ");

        string combinedResponse = string.Empty;
        await foreach (var message in response)
        {
            // Write the response to the console
            Console.Write(message);
            combinedResponse += message;
        }

        Console.WriteLine();
    }
}
Barshan-Mandal commented 4 weeks ago

It looks like you're trying to set up a memory retrieval system using the Semantic Kernel and ONNX Runtime for a chat application.

Here are a few things to check and some suggestions to help troubleshoot the issue:

  1. Dependencies and Imports:

    • Ensure all the necessary NuGet packages are installed and up-to-date.
    • Verify that the namespaces and classes you're using are correctly referenced.
  2. Model Path and ID:

    • Double-check the modelPath and modelId to ensure they are correct and accessible.
  3. Kernel and Services Initialization:

    • Make sure the kernel and services are properly initialized. Check for any exceptions or errors during the initialization process.
  4. Memory Store and Plugin:

    • Ensure the VolatileMemoryStore and TextMemoryPlugin are correctly set up and integrated with the kernel.
  5. Prompt Execution Settings:

    • Verify the OpenAIPromptExecutionSettings and ensure they are correctly configured for your use case.
  6. Async Method Execution:

    • Ensure that all asynchronous methods are awaited properly to avoid any runtime issues.

Here's a slightly revised version of your code with some minor adjustments if you want to try this:

#pragma warning disable SKEXP0070
#pragma warning disable SKEXP0050
#pragma warning disable SKEXP0001
#pragma warning disable SKEXP0052
#pragma warning disable NU1605

using Build5Nines.SharpVector;
using Build5Nines.SharpVector.Data;
using Elastic.Clients.Elasticsearch.Core.Search;
using Microsoft.KernelMemory;
using Microsoft.ML.OnnxRuntimeGenAI;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Plugins.Memory;
using System.Collections;
using System.Threading.Tasks;

private static async Task Phi3MemoryRAG()
{
    var modelPath = @"F:\HuggingFace Models\Phi-3-Vision\cpu-int4-rtn-block-32-acc-level-4";
    string modelId = "microsoft/Phi-3-vision-128k-instruct";

    // Load the model and services
    var builder = Kernel.CreateBuilder();
    builder.AddOnnxRuntimeGenAIChatCompletion(modelId, modelPath);
    builder.AddLocalTextEmbeddingGeneration();

    // Build Kernel
    var kernel = builder.Build();

    // Create services such as chatCompletionService and embeddingGeneration
    var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
    var embeddingGenerator = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

    // Setup a memory store and create a memory out of it
    var memoryStore = new VolatileMemoryStore();
    var memory = new SemanticTextMemory(memoryStore, embeddingGenerator);

    const string MemoryCollectionName = "fanFacts";

    await memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "Gisela's favourite super hero is Batman");
    await memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "The last super hero movie watched by Gisela was Guardians of the Galaxy Vol 3");
    await memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "Bruno's favourite super hero is Invincible");
    await memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "The last super hero movie watched by Bruno was Aquaman II");
    await memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "Bruno doesn't like the super hero movie: Eternals");

    // Loading it for Save, Recall and other methods
    kernel.ImportPluginFromObject(new TextMemoryPlugin(memory));

    while (true)
    {
        // Get user input
        Console.ForegroundColor = ConsoleColor.White;
        Console.Write("User > ");
        var question = Console.ReadLine()!;

        // Settings for the Phi-3 execution
        OpenAIPromptExecutionSettings executionSettings = new()
        {
            ToolCallBehavior = ToolCallBehavior.EnableKernelFunctions,
            MaxTokens = 200
        };

        // Invoke the kernel with the user input
        var response = kernel.InvokePromptStreamingAsync(
            promptTemplate: @"Question: {{$input}}
Answer the question using the memory content: {{Recall}}",
            arguments: new KernelArguments(executionSettings)
            {
                { "input", question },
                { "collection", MemoryCollectionName }
            }
        );

        Console.ForegroundColor = ConsoleColor.Green;
        Console.Write("\nAssistant > ");

        string combinedResponse = string.Empty;
        await foreach (var message in response)
        {
            // Write the response to the console
            Console.Write(message);
            combinedResponse += message;
        }

        Console.WriteLine();
    }
}

System.MissingMethodException HResult=0x80131513 Message=Method not found: 'System.ValueTuple3<System.ReadOnlyMemory1<Int64>,System.ReadOnlyMemory1,System.ReadOnlyMemory1<Int64>> FastBertTokenizer.BertTokenizer.Encode(System.String, Int32, System.Nullable1)'. Source=SmartComponents.LocalEmbeddings StackTrace: at SmartComponents.LocalEmbeddings.LocalEmbedder.Embed[TEmbedding](String inputText, Nullable1 outputBuffer, Int32 maximumTokens) at SmartComponents.LocalEmbeddings.LocalEmbedder.Embed(String inputText, Int32 maximumTokens) at SmartComponents.LocalEmbeddings.SemanticKernel.LocalTextEmbeddingGenerationService.GenerateEmbeddingsAsync(IList1 data, Kernel kernel, CancellationToken cancellationToken) at Microsoft.SemanticKernel.Embeddings.EmbeddingGenerationExtensions.d0`2.MoveNext() at Microsoft.SemanticKernel.Memory.SemanticTextMemory.d3.MoveNext() at phi3vision_aiconsole.Program.d2.MoveNext() in O:\Windows For Programming\Projects\Visual Studio\Console\Ai\phi3vision_aiconsole\Program.cs:line 88 at phi3vision_aiconsole.Program.

d0.MoveNext() in O:\Windows For Programming\Projects\Visual Studio\Console\Ai\phi3vision_aiconsole\Program.cs:line 25 `

Barshan-Mandal commented 3 weeks ago

How do i use RAG by kernel memory and Semantic kernel Handlebar Planner with phi3 vision

leestott commented 2 weeks ago

@Barshan-Mandal See https://techcommunity.microsoft.com/t5/educator-developer-blog/building-intelligent-applications-with-local-rag-in-net-and-phi/ba-p/4175721

Barshan-Mandal commented 2 weeks ago

the above exception is thrown