Postgres timeout - Githubissues

roldengarm commented 3 weeks ago

Context / Scenario

Running KM as a service, using queues, Azure OpenAI Embedding-3, PostgresDB as backend. Ingested about 900k records & ~15Gb of data. Consuming it from a console application using the WebClient.

Initially after a couple of thousand documents, when calling SearchAsync / AskAsync, it worked fine. However, after ~900k records, I'm getting a server error (500) every time. In the logs I can see a Npgsql.NpgsqlException - Timeout.

What happened?

I'm getting a server error every time. The ingestion process is still running at about ~12 documents at the same time. I don't see high CPU usage on the App Service or Postgres (Flexible Azure server). Tried increasing Postgres to 4vCores 16Gb RAM, no difference.

We're planning to ingest a total of 9m documents so it's concerning it's already throwing errors at 900k.

Importance

I cannot use Kernel Memory

Platform, Language, Versions

Using C#, KM deployed as a service to Azure App Service, using Azure Postgres Flexible Server, Azure Storage, and Azure OpenAI

This is our second try; we initially tried using Azure AI Search instead of Azure Postgres, but the costs for storing ~9m records was astronomical.

Relevant log output

Call Stack: Npgsql.NpgsqlException:
   at Npgsql.Internal.NpgsqlConnector+<ReadMessageLong>d__233.MoveNext (Npgsql, Version=8.0.0.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1+StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Npgsql.NpgsqlDataReader+<NextResult>d__52.MoveNext (Npgsql, Version=8.0.0.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Npgsql.NpgsqlDataReader+<NextResult>d__52.MoveNext (Npgsql, Version=8.0.0.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Npgsql.NpgsqlCommand+<ExecuteReader>d__119.MoveNext (Npgsql, Version=8.0.0.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Npgsql.NpgsqlCommand+<ExecuteReader>d__119.MoveNext (Npgsql, Version=8.0.0.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient+<GetSimilarAsync>d__21.MoveNext (Microsoft.KernelMemory.Postgres, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/extensions/Postgres/Postgres/Internals/PostgresDbClient.cs:453)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient+<GetSimilarAsync>d__21.MoveNext (Microsoft.KernelMemory.Postgres, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/extensions/Postgres/Postgres/Internals/PostgresDbClient.cs:480)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient+<GetSimilarAsync>d__21.MoveNext (Microsoft.KernelMemory.Postgres, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/extensions/Postgres/Postgres/Internals/PostgresDbClient.cs:480)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient+<GetSimilarAsync>d__21.System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult (Microsoft.KernelMemory.Postgres, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null)
   at Microsoft.KernelMemory.Postgres.PostgresMemory+<GetSimilarListAsync>d__8.MoveNext (Microsoft.KernelMemory.Postgres, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/extensions/Postgres/Postgres/PostgresMemory.cs:169)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Postgres.PostgresMemory+<GetSimilarListAsync>d__8.MoveNext (Microsoft.KernelMemory.Postgres, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/extensions/Postgres/Postgres/PostgresMemory.cs:169)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.ThrowForFailedGetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Postgres.PostgresMemory+<GetSimilarListAsync>d__8.System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult (Microsoft.KernelMemory.Postgres, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null)
   at Microsoft.KernelMemory.Search.SearchClient+<AskAsync>d__8.MoveNext (Microsoft.KernelMemory.Core, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/service/Core/Search/SearchClient.cs:228)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Search.SearchClient+<AskAsync>d__8.MoveNext (Microsoft.KernelMemory.Core, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/service/Core/Search/SearchClient.cs:321)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Service.AspNetCore.WebAPIEndpoints+<>c+<<AddAskEndpoint>b__5_0>d.MoveNext (Microsoft.KernelMemory.Service.AspNetCore, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/service/Service.AspNetCore/WebAPIEndpoints.cs:211)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.AspNetCore.Http.RequestDelegateFactory+<<TaskOfTToValueTaskOfObject>g__ExecuteAwaited|92_0>d`1.MoveNext (Microsoft.AspNetCore.Http.Extensions, Version=8.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.KernelMemory.Service.HttpAuthEndpointFilter+<InvokeAsync>d__2.MoveNext (Microsoft.KernelMemory.ServiceAssembly, Version=0.61.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/kernel-memory/kernel-memory/service/Service/Auth/HttpAuthHandler.cs:38)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.AspNetCore.Http.RequestDelegateFactory+<<ExecuteValueTaskOfObject>g__ExecuteAwaited|129_0>d.MoveNext (Microsoft.AspNetCore.Http.Extensions, Version=8.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.AspNetCore.Http.RequestDelegateFactory+<>c__DisplayClass102_2+<<HandleRequestBodyAndCompileRequestDelegateForJson>b__2>d.MoveNext (Microsoft.AspNetCore.Http.Extensions, Version=8.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.AspNetCore.Routing.EndpointMiddleware+<<Invoke>g__AwaitRequestTask|7_0>d.MoveNext (Microsoft.AspNetCore.Routing, Version=8.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware+<Invoke>d__6.MoveNext (Microsoft.AspNetCore.Authentication, Version=8.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult (System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.AspNetCore.Server.IIS.Core.IISHttpContextOfT`1+<ProcessRequestAsync>d__2.MoveNext (Microsoft.AspNetCore.Server.IIS, Version=8.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
Inner exception System.TimeoutException handled at Npgsql.Internal.NpgsqlConnector+<ReadMessageLong>d__233.MoveNext:
, Message: Exception while reading from stream

dluc commented 2 weeks ago

hi @roldengarm could you provide some more information on the following points?

Name of the model used for embeddings.
Embedding vector size (256, 512, 1024, 1536, or 3072).
Are you truncating embedding vectors?

It seems the VM hosting Postgres might be running out of memory due to the vector index size. Here are a few options to consider:

Disable vector indexing: Import all the data first and enable indexing afterward. Monitor the memory usage during this process and see if the indexing operation fails. Can you increase the memory if needed?
Change index type: Consider using HNSW instead of IVFFlat. HNSW uses less memory, allowing you to index more records, though it may slow down search performance.
Truncate embedding vectors: If you’re using text-embedding-3, truncating the vectors can reduce the memory footprint. Keep in mind that this reduces also the precision of relevance scores: test this approach to see if Search returns incorrect results or if Ask produces hallucinations. For more details see "shortening embeddings" (https://openai.com/index/new-embedding-models-and-api-updates/) and MRL (e.g. here https://aniketrege.github.io/blog/2024/mrl/).

roldengarm commented 2 weeks ago

Hi @dluc thanks for your reply!

We're using text-embedding-3-large on Azure OpenAI.

Regarding vector size or truncating of vectors: I'm unsure. I've just deployed Kernel Memory as a service with the dotnet setup wizard. So, I'm using the default settings I guess.

Technically, I can increase memory relatively easily as it's on Azure Flexible Postgres, but obviously it comes at a cost. I've tried to upgrade to 64GB, but the problem only went away after I stopped the ingestion. This is strange as I run the ingestion at max 12 in parallel.

Search performance is critical as it will be used in a chat interface, so don't think HNSW is suitable.

dluc commented 2 weeks ago

Try using text-embedding-3-small, that should cut memory usage in half. The problem here is about sizing Postgres infrastructure accordingly to the data used, including the index size in memory. If cost is a facrtor, you should test HNSW before discarding the option, to understand the impact on performance and how much you can save in monthly costs.

microsoft / kernel-memory

Postgres timeout #665

Context / Scenario

What happened?

Importance

Platform, Language, Versions

Relevant log output