postgresml / pgcat

PostgreSQL pooler with sharding, load balancing and failover support.
MIT License
3.13k stars 193 forks source link

PgCat should not return and log 'all servers down' when failing to obtain a connection from the pool #824

Open smcgivern opened 2 months ago

smcgivern commented 2 months ago

This is related to https://github.com/postgresml/pgcat/pull/822 - we were seeing this message when trialling PgCat in our production environment. We couldn't see why the Postgres server in question was down, and the answer is that it wasn't 🙂

Instead, we were queueing for longer than connect_timeout. When that happens in PgBouncer, you get this:

linear_production_copy=# SELECT 1;
FATAL:  08P01: query_wait_timeout
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

In PgCat, you get this:

linear_production_copy=# SELECT 1;
FATAL:  58000: could not get connection from the pool - AllServersDown

Which is partly right and partly misleading. I think PgCat should use a more specific error message in this case. I'm happy to create a PR if people agree.

drdrsh commented 2 months ago

I certainly agree with that. There are several error messages around checkout and health checks that could be made more clear but we can start with this one.

omer-topal commented 4 weeks ago

Definitely agree.