microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.62k stars 313 forks source link

[Bug] Excel file with 8+++ rows OutOfMemoryException #904

Closed roberAlb closed 6 days ago

roberAlb commented 1 week ago

Context / Scenario

We trying to import documents to Azure AI Search calling KM from Azure Functions and we are struggling with excel files with a lot of rows.

What happened?

Basically it fails with an OutOfMemoryException error when we are processing excel files. Locally it works but it takes 40-50 mins to import to Azure AI Search. We are not talking about a big document size, it's a document with a 8500 rows more or less and less that 1MB.

Could anyone advise? I don't know why it's taking too long locally, is there any limitation about xlsx files or any known issue about them?

Importance

a fix would make my life easier

Platform, Language, Versions

Windows 10, C#, Kernel Memory 0.51.240513.2

Relevant log output

Exception of type 'System.OutOfMemoryException' was thrown. Exception: System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.Text.Json.JsonReaderHelper.TranscodeHelper(ReadOnlySpan`1 utf8Unescaped) at System.Text.Json.JsonSerializer.WriteString[TValue](TValue& value, JsonTypeInfo`1 jsonTypeInfo) at System.Text.Json.JsonSerializer.Serialize[TValue](TValue value, JsonSerializerOptions options) at Microsoft.KernelMemory.Pipeline.BaseOrchestrator.ToJson(Object data, Boolean indented) in ...\KernelMemory\service\Core\Pipeline\BaseOrchestrator.cs:line 443 at Microsoft.KernelMemory.Pipeline.BaseOrchestrator.UpdatePipelineStatusAsync(DataPipeline pipeline, CancellationToken cancellationToken) in ....\KernelMemory\service\Core\Pipeline\BaseOrchestrator.cs:line 426
marcominerva commented 1 week ago

You're using a very old version of Kernel Memory. First of all, I suggest you to update the the latest version 0.93.241118.1.

dluc commented 6 days ago

hi @roberAlb I tried the decoder against this spreadsheet, and everything worked fine. I would suggest upgrading the solution as suggested by @marcominerva and retrying.

Test: