microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.34k stars 252 forks source link

[Bug] SQL failing to save when too many tags with the same key are present (list/content too big for the key) #527

Closed bossbast1 closed 1 month ago

bossbast1 commented 1 month ago

Context / Scenario

We are ingesting Documents to KM, we store them to SQL -> SQL content is displayed in UI for users. In case our Document has too many Keywords (too many tags with the same key), SQL fails on save_records with confusing error: JSON text is not properly formatted. Unexpected character '"' is found at position 245. (always around this 250 position mark)

If all Keywords are batched by 10, eg keys are Keyword1, Keyword2, ... it works, but separating is not really a valid option for us.

example code that results in stuck document:

var httpClient = new HttpClient();

var contentTest = new MultipartFormDataContent();

var fileContentTest = new ByteArrayContent(Encoding.UTF8.GetBytes("Test text"));
contentTest.Add(fileContentTest, "file", $"test.txt");
contentTest.Add(new StringContent("Test001"), "documentId");
contentTest.Add(new StringContent("index002"), "index");

for (int i = 0; i < 100; i++)
{
    contentTest.Add(new StringContent($"Keyword:test{i}"), $"tags");
}

var responseTest = await httpClient.PostAsync("https://<KM server API>/upload", contentTest);
var resTest = await responseTest.Content.ReadAsStringAsync();

in KM, this flow is used:

ConfigureIngestionMemoryDb ->

case string x when x.Equals("SqlServer", StringComparison.OrdinalIgnoreCase):
{
    var instance = this.GetServiceInstance<IMemoryDb>(builder,
        s => s.AddSqlServerAsMemoryDb(this.GetServiceConfig<SqlServerConfig>("SqlServer"))
    );
    builder.AddIngestionMemoryDb(instance);
    break;
}

What happened?

SQL should process it properly without crash. (when omitting SQL, KM is able to properly store the tags in search service without crashing)

Importance

a fix would make my life easier

Platform, Language, Versions

C#, Azure SQL Instance, version: Microsoft.KernelMemory.MemoryDb.SQLServer and Core - 0.61.240524.1

Relevant log output

│ trce: Microsoft.KernelMemory.Handlers.SaveRecordsHandler[0]                                                                                                                                                                                                                                                                                                                            │
│       Saving record d=KB30488_Zurich-Switzerland//p=1c8f1382edc343258a85a12e124e9b54 in index 'index002'                                                                                                                                                                                                                                                                               │
│ warn: Microsoft.KernelMemory.Orchestration.AzureQueues.AzureQueuesPipeline[0]                                                                                                                                                                                                                                                                                                          │
│       Message '01ef2d66-4b15-43c8-83ef-fabc0e1a618c' processing failed with exception, putting message back in the queue with a delay of 11000 msecs                                                                                                                                                                                                                                   │
│       Microsoft.Data.SqlClient.SqlException (0x80131904): JSON text is not properly formatted. Unexpected character '"' is found at position 245.                                                                                                                                                                                                                                      │
│          at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)                                                                                                                                                                                                                                                │
│          at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose)                                                                                                                                                                                                        │
│          at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)                                                                                                                                                            │
│          at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)                                                                                                                                           │
│          at Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(Boolean isInternal, Boolean forDescribeParameterEncryption)                                                                                                                                                                                                                                                 │
│          at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)                                                                                                                                                                                                                                             │
│          at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)                                                                                                                                                                                                                                                                                   │
│          at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)                                                                                                                                                                                                                                                                                      │
│          at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)                                                                                                                                                                                                         │
│       --- End of stack trace from previous location ---                                                                                                                                                                                                                                                                                                                                │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.BatchUpsertAsync(String index, IEnumerable`1 records, CancellationToken cancellationToken)+MoveNext()                                                                                                                                                                                                            │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.BatchUpsertAsync(String index, IEnumerable`1 records, CancellationToken cancellationToken)+MoveNext()                                                                                                                                                                                                            │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.BatchUpsertAsync(String index, IEnumerable`1 records, CancellationToken cancellationToken)+MoveNext()                                                                                                                                                                                                            │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.UpsertAsync(String index, MemoryRecord record, CancellationToken cancellationToken)                                                                                                                                                                                                                              │
│          at Microsoft.KernelMemory.MemoryDb.SQLServer.SqlServerMemory.UpsertAsync(String index, MemoryRecord record, CancellationToken cancellationToken)                                                                                                                                                                                                                              │
│          at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.SaveRecordAsync(DataPipeline pipeline, IMemoryDb db, MemoryRecord record, HashSet`1 createdIndexes, CancellationToken cancellationToken)                                                                                                                                                                                │
│          at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)                                                                                                                                                                                                                                                 │
│          at Microsoft.KernelMemory.Pipeline.DistributedPipelineOrchestrator.RunPipelineStepAsync(DataPipeline pipeline, IPipelineStepHandler handler, CancellationToken cancellationToken)                                                                                                                                                                                             │
│          at Microsoft.KernelMemory.Pipeline.DistributedPipelineOrchestrator.<>c__DisplayClass5_0.<<AddHandlerAsync>b__0>d.MoveNext()                                                                                                                                                                                                                                                   │
│       --- End of stack trace from previous location ---                                                                                                                                                                                                                                                                                                                                │
│          at Microsoft.KernelMemory.Orchestration.AzureQueues.AzureQueuesPipeline.<>c__DisplayClass20_0.<<OnDequeue>b__0>d.MoveNext()                                                                                                                                                                                                                                                   │
│       ClientConnectionId:04a49bb0-393e-40f1-9fce-1fe2b9b225a1                                                                                                                                                                                                                                                                                                                          │
│       Error Number:13609,State:4,Class:16
dluc commented 1 month ago

FYI @kbeaugrand if you have a chance to look into it - thanks

kbeaugrand commented 1 month ago

Will take a look soon.