PgCat should not return and log 'all servers down' when failing to obtain a connection from the pool

smcgivern commented 2 months ago

This is related to https://github.com/postgresml/pgcat/pull/822 - we were seeing this message when trialling PgCat in our production environment. We couldn't see why the Postgres server in question was down, and the answer is that it wasn't 🙂

Instead, we were queueing for longer than connect_timeout. When that happens in PgBouncer, you get this:

linear_production_copy=# SELECT 1;
FATAL:  08P01: query_wait_timeout
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

In PgCat, you get this:

linear_production_copy=# SELECT 1;
FATAL:  58000: could not get connection from the pool - AllServersDown

Which is partly right and partly misleading. I think PgCat should use a more specific error message in this case. I'm happy to create a PR if people agree.

drdrsh commented 2 months ago

I certainly agree with that. There are several error messages around checkout and health checks that could be made more clear but we can start with this one.

omer-topal commented 4 weeks ago

Definitely agree.

postgresml / pgcat

PgCat should not return and log 'all servers down' when failing to obtain a connection from the pool #824