Deadlocked/hung pool when connected to Postgres through Supavisor

nikhilro commented 1 month ago

Hey @porsager, thanks for the package.

Creating this issue mostly in hopes that someone else has ran into this issue.

We use Supabase for our Postgres DB and connect through their pooling service Supavisor in transaction mode; Supavisor is an alternative to pgBouncer (link). On postgres.js side, we set prepare: false.

What we're seeing is that the postgres.js pooler will start swallowing queries at some point. Imagine a function like:

async function userGet(id: string) {
  console.log("trying to get the user");
  const users = await db<User[]>`SELECT * FROM user WHERE id=${id}`;
  console.log("got the user!");
  return users;
}

After a while, this will print "trying to get the user" but not "got the user!". Does this ring any bells?

P.S. been trying to find an isolated script to reproduce but unsuccessful so far. the only reproduction is live production traffic.

nikhilro commented 1 month ago

I'm more confident that is an issue with Supavisor under load. A thing that would be helpful is postgres.js statement limit.

I know we have:

   connection: {
     statement_timeout: 1000 * 60 * 0.5, // 30 seconds, pg expects milliseconds
   },

But, that is not useful when connecting to transaction mode poolers.

Is there an easy way to "abort" the query if it's taking too long?

lllleonnnn commented 16 hours ago

Please forgive not answering your exact question but we have run into very similar issues and it boiled down to our queries being blocking & locking. We've used this query (modify as needed) to run down issues in our Supabase PG instance:

SELECT blocked_locks.pid AS blocked_pid,
 blocked_activity.usename AS blocked_user,
 blocking_locks.pid AS blocking_pid,
         blocking_activity.usename AS blocking_user, 
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process
   FROM  pg_catalog.pg_locks         blocked_locks 
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype 
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid 
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED;

Smashing this query over and over when the problem is happening may give some insight as well:

SELECT
    query,
    avg(now() - query_start) AS average_duration,
    usename,
    pid
FROM
    pg_stat_activity
WHERE
    state = 'active'
    AND query <> '<IDLE>'
    AND query NOT LIKE '%pg_stat_activity%'
GROUP BY
    query,
    usename,
    pid
ORDER BY
    average_duration DESC
LIMIT 10000;

If your logs aren't helpful, ye olde ALTER SYSTEM SET log_lock_waits TO on; may yield more helpful info in the Postgres and/or Pooler logging views, as well as altering log_statement_sample_rate and log_min_duration_sample.

re: your actual question - I did implement something hideous with await Promise.race(... which did not help. We did not have any specific issues with Supavisor but switched to running our own pgbouncers.

also likewise shout out to @porsager for making an amazing library that lets me avoid the hell of ORMs.

porsager / postgres

Deadlocked/hung pool when connected to Postgres through Supavisor #970