microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
https://microsoft.github.io/kernel-memory
MIT License
1.52k stars 292 forks source link

[Bug] Connection pool not released after Postgres query #679

Closed xuzeyu91 closed 3 weeks ago

xuzeyu91 commented 3 months ago

Context / Scenario

await memoryDb.GetListAsync(KmsConstantcs.KmsIndex, filters: new List<MemoryFilter>() { new MemoryFilter().ByDocument(fileId) }, limit: int.MaxValue).ToListAsync().ConfigureAwait(false);

I often encounter anomalies such as sorry, too many clients are already ready when querying document slices. I have observed that each query adds a new link

SELECT count(*) FROM pg_stat_activity;

I have found that every query adds new links and they are not dispose. Frequent queries will cause my Postgres connection pool to fill up

What happened?

too many clients are already ready

Importance

I cannot use Kernel Memory

Platform, Language, Versions

I use kernel-memory to connect to Postgres

Relevant log output

too many clients are already ready
xuzeyu91 commented 3 months ago

https://github.com/microsoft/kernel-memory/blob/3d34260ae513af48030da9a56aa50b8e0162c6f8/extensions/Postgres/Postgres/Internals/PostgresDbClient.cs#L529

I'm not sure if there's a problem here

xuzeyu91 commented 3 months ago
 _memoryDbs = memory.Orchestrator.GetMemoryDbs();
 foreach (var memoryDb in _memoryDbs)
 {
     var items = await memoryDb.GetListAsync(KmsConstantcs.KmsIndex, filters: new List<MemoryFilter>() { new MemoryFilter().ByDocument(fileId) }, limit: int.MaxValue).ToListAsync().ConfigureAwait(false);
 }

This is my code, I need to retrieve the slice data from the document, which will cause the number of connections to increase continuously

marcominerva commented 3 months ago

@dluc, could it be related to the fact that PostgresMemory is registered as singleton? PostgresMemory in turns creates a PostgresDbClient. So, even if both PostgresMemory and PostgresDbClient implements IDisposable, the Dispose methods are never called.

In particular, I see that PostgresDbClient contains an NpgsqlDataSource object that is created in the constructor but never disposed: https://github.com/microsoft/kernel-memory/blob/3d34260ae513af48030da9a56aa50b8e0162c6f8/extensions/Postgres/Postgres/Internals/PostgresDbClient.cs#L31

dluc commented 2 months ago

Each connection is opened and released using this pattern:

NpgsqlConnection connection = await this.ConnectAsync(cancellationToken);
await using (connection)
{
    // do some work
    // return result
}

DisposeAsync() calls CloseAsync() so everything should work. I've changed the pattern in https://github.com/microsoft/kernel-memory/pull/720 (see below) to call CloseAsync explicitly, but I'm not convinced this will help.

The problem reported might be caused by a high number of transactions, in which case I would check Postgres configuration to scale accordingly.

NpgsqlConnection connection = await this.ConnectAsync(cancellationToken);
await using (connection)
{
    try 
        // do some work
        // return result
    finally
        // disconnect
}
dluc commented 1 month ago

@xuzeyu91 is the problem still occurring with the latest code?

xuzeyu91 commented 1 month ago

@xuzeyu91 is the problem still occurring with the latest code?

I am using version 0.70.240803.1, but this issue still persists

dluc commented 1 month ago

The fact that the client is a singleton should protect from too many connections. If someone has the time to investigate further that could help. For instance, log every new connection, monitor if connections are actually closed, check PG configuration to see what the limit is, etc.

dluc commented 3 weeks ago

This should be fixed with the latest release, otherwise please let us know.