microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.52k stars 293 forks source link

[Bug] 22021: invalid byte sequence for encoding "UTF8": 0x00 #649

Closed pyliakm closed 3 months ago

pyliakm commented 3 months ago

Context / Scenario

I am using PostgreSQL + pgvector, and I got an exception when saving the result to the database.

var kernelMemory = new KernelMemoryBuilder()
                    .WithPostgresMemoryDb(new PostgresConfig() { ConnectionString = _supabaseConfig.ConnectionString })
                    .WithOpenAIDefaults(_encryptionModelService.Decrypt(organization!.OpenAIAccessToken))
                    .WithContentDecoder<CustomImageDecoder>()
                    .Build<MemoryServerless>();

What happened?

I expect that it should work for any PDF documents. It is the PDF that fails. SynergyOS Design Guide.pdf

Importance

I cannot use Kernel Memory

Platform, Language, Versions

Microsoft.KernelMemory.MemoryDb.Postgres v0.62.240604.1

Relevant log output

Microsoft.KernelMemory.Postgres.PostgresException: 22021: invalid byte sequence for encoding "UTF8": 0x00
 ---> Npgsql.PostgresException (0x80004005): 22021: invalid byte sequence for encoding "UTF8": 0x00
   at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
  Exception data:
    Severity: ERROR
    SqlState: 22021
    MessageText: invalid byte sequence for encoding "UTF8": 0x00
    Where: unnamed portal parameter $4
    File: mbutils.c
    Line: 1665
    Routine: report_invalid_encoding
   --- End of inner exception stack trace ---
   at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Postgres.PostgresMemory.UpsertAsync(String index, MemoryRecord record, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.SaveRecordAsync(DataPipeline pipeline, IMemoryDb db, MemoryRecord record, HashSet`1 createdIndexes, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Pipeline.InProcessPipelineOrchestrator.RunPipelineAsync(DataPipeline pipeline, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Pipeline.BaseOrchestrator.ImportDocumentAsync(String index, DocumentUploadRequest uploadRequest, CancellationToken cancellationToken)
dluc commented 3 months ago

@pyliakm could you provide a PDF that allows to reproduce this error?

pyliakm commented 3 months ago

@pyliakm could you provide a PDF that allows to reproduce this error? I have added it to the issue description.

dluc commented 3 months ago

Thanks for the report! Bug fixed