microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.6k stars 308 forks source link

GenerateEmbeddingsHandler - An item with the same key has already been added #412

Closed clarity99 closed 7 months ago

clarity99 commented 7 months ago

Context / Scenario

I just want to add document to the memory store and I get this error. Code:


 var ms = new System.IO.MemoryStream();
 await uploadForm.File.OpenReadStream(uploadForm.File.Size).CopyToAsync(ms);
 var d = await km.ImportDocumentAsync(ms, uploadForm.File.Name);

KM is setup as follows:

var km = new KernelMemoryBuilder()
       .WithOpenAIDefaults(openAIConfig.APIKey)
       .WithOpenAITextEmbeddingGeneration(openAIConfig)
       .WithSimpleVectorDb(new SimpleVectorDbConfig { Directory= "/tmp/robRezervacije/" })
       .WithSimpleFileStorage(new Microsoft.KernelMemory.ContentStorage.DevTools.SimpleFileStorageConfig { Directory = "/tmp/robRezervacije/" })
       .Build<MemoryServerless>();

What happened?

I got an exception System.ArgumentException: An item with the same key has already been added. Key: perls.pdf.partition.0.txt.AI.OpenAI.OpenAITextEmbeddingGenerator.TODO.text_embedding at System.Collections.Generic.Dictionary2.TryInsert(TKey key, TValue value, InsertionBehavior behavior) at System.Collections.Generic.Dictionary2.Add(TKey key, TValue value) at Microsoft.KernelMemory.Handlers.GenerateEmbeddingsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)

Importance

I cannot use Kernel Memory

Platform, Language, Versions

.net 8, macos sonoma 14.4.1, KM 0.36.240415.2 c#

Relevant log output

`dbug: Microsoft.KernelMemory.Handlers.GenerateEmbeddingsHandler[0]
      Saving embedding file perls.pdf.partition.0.txt.AI.OpenAI.OpenAITextEmbeddingGenerator.TODO.text_embedding
trce: Microsoft.KernelMemory.Handlers.GenerateEmbeddingsHandler[0]
      Generating embeddings using AI.OpenAI.OpenAITextEmbeddingGenerator, file: perls.pdf.partition.0.txt
dbug: Microsoft.KernelMemory.Handlers.GenerateEmbeddingsHandler[0]
      Saving embedding file perls.pdf.partition.0.txt.AI.OpenAI.OpenAITextEmbeddingGenerator.TODO.text_embedding
fail: Microsoft.KernelMemory.Pipeline.BaseOrchestrator[0]
      Pipeline start failed
      System.ArgumentException: An item with the same key has already been added. Key: perls.pdf.partition.0.txt.AI.OpenAI.OpenAITextEmbeddingGenerator.TODO.text_embedding
         at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
         at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
         at Microsoft.KernelMemory.Handlers.GenerateEmbeddingsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)
         at Microsoft.KernelMemory.Pipeline.InProcessPipelineOrchestrator.RunPipelineAsync(DataPipeline pipeline, CancellationToken cancellationToken)
         at Microsoft.KernelMemory.Pipeline.BaseOrchestrator.ImportDocumentAsync(String index, DocumentUploadRequest uploadRequest, CancellationToken cancellationToken)`
dluc commented 7 months ago

I think you're adding the same embedding generator twice when using WithOpenAIDefaults and WithOpenAITextEmbeddingGeneration.

try removing WithOpenAIDefaults, changing this code:

       .WithOpenAIDefaults(openAIConfig.APIKey)
       .WithOpenAITextEmbeddingGeneration(openAIConfig)

to

       .WithOpenAITextGeneration(openAITextConfig)
       .WithOpenAITextEmbeddingGeneration(openAIEmbeddingConfig)
clarity99 commented 7 months ago

Ah, this works, thank you! I wasn't aware that sets both generation and embeddings.