microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.52k stars 293 forks source link

[Question] Export imported Documents #522

Closed SymoHTL closed 4 months ago

SymoHTL commented 4 months ago

Context / Scenario

I am currently Using LLamaSharp to run my LLMs, then i got inspired by privateGPT and thought i can do that with my codebases, but it takes a very long time for my budget server to ingest them everytime

Question

How can i export the imported files, to a vector db or smth?

dluc commented 4 months ago

@SymoHTL sorry I'm not sure I understand the question. Are you already using Kernel Memory? when you say "export" do you plan to migrate from some solution to another one? Could you provide more details?

SymoHTL commented 4 months ago

i think i am using kernel memory, this is my only and current solution

ok, i am just starting out currently i have this code:

public override async Task PromptMemoryIngestedAi(IAsyncStreamReader<AiRequest> requestStream,
        IServerStreamWriter<AiMemoryReply> responseStream,
        ServerCallContext context) {
        var infParams = new InferenceParams { AntiPrompts = ["\n\n"] };
        var lsConfig = new LLamaSharpConfig(ModelPath) { DefaultInferenceParams = infParams };
        var searchClientConfig = new SearchClientConfig { MaxMatchesCount = 1, AnswerTokens = 100 };
        var parseOptions = new TextPartitioningOptions
            { MaxTokensPerParagraph = 300, MaxTokensPerLine = 100, OverlappingTokens = 30 };
        var memory = new KernelMemoryBuilder()
            .WithLLamaSharpDefaults(lsConfig)
            .WithSearchClientConfig(searchClientConfig)
            .With(parseOptions)
            .Build();

        // Ingest documents (format is automatically detected from the filename).
        const string documentFolder = @"D:\ai-ingest";
        var documentPaths = Directory.GetFiles(documentFolder, "*.pdf", SearchOption.AllDirectories);
        await responseStream.WriteAsync(new AiMemoryReply
            { Message = $"Ingesting {documentPaths.Length} documents...\n" });
        foreach (var doc in documentPaths) {
            var docId = await memory.ImportDocumentAsync(doc, steps: Constants.PipelineWithoutSummary);
        }

        await responseStream.WriteAsync(new AiMemoryReply { Message = "Ingestion complete.\n" });

        await foreach (var request in requestStream.ReadAllAsync()) {
            var answer = await memory.AskAsync(request.Prompt);
            await responseStream.WriteAsync(new AiMemoryReply {
                Message = answer.Result, Citations = { answer.RelevantSources.ToMemoryCitations() }
            });
        }
    }

my problem now is that the ingestion (Importing pdf's) takes a long time and i was wondering how i can safe the imported pdfs, like in a vector db or just in a plain file, on the docs page there is a diagram with Memory Db, however i didnt find anything on how i can do smth like that

SymoHTL commented 4 months ago

nvm i found this https://www.developerscantina.com/p/kernel-memory/

sorry for the inconvenience