jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.91k stars 849 forks source link

pgx hangs when trying to acquire a connection from the pool #1889

Open mpjunior92 opened 10 months ago

mpjunior92 commented 10 months ago

Hi,

I am using pgxpool to manage DB connections to a PostgreSQL instance and have observed a very strange behavior after running a simple query on the DB: pgx is not able to acquire a connection from the pool; it gets stuck while waiting for DB data to be sent (I left running for 2 hours and no connection was acquired). Here is the call stack collected using pprof:

#   0x477d5b    internal/poll.runtime_pollWait+0x3b                                         /usr/local/go/src/runtime/netpoll.go:343
#   0x53d27d    internal/poll.(*pollDesc).wait+0x7d                                         /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
#   0x53d310    internal/poll.(*pollDesc).waitRead+0x30                                         /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
#   0x53edce    internal/poll.(*FD).Read+0x40e                                              /usr/local/go/src/internal/poll/fd_unix.go:164
#   0x72be72    net.(*netFD).Read+0x72                                                  /usr/local/go/src/net/fd_posix.go:55
#   0x74f6fc    net.(*conn).Read+0x9c                                                   /usr/local/go/src/net/net.go:179
#   0x7e1721    crypto/tls.(*atLeastReader).Read+0xc1                                           /usr/local/go/src/crypto/tls/conn.go:805
#   0x57a41b    bytes.(*Buffer).ReadFrom+0x13b                                              /usr/local/go/src/bytes/buffer.go:211
#   0x7e1a48    crypto/tls.(*Conn).readFromUntil+0x168                                          /usr/local/go/src/crypto/tls/conn.go:827
#   0x7df8d8    crypto/tls.(*Conn).readRecordOrCCS+0x258                                        /usr/local/go/src/crypto/tls/conn.go:625
#   0x7df5e4    crypto/tls.(*Conn).readRecord+0x24                                          /usr/local/go/src/crypto/tls/conn.go:587
#   0x7e6629    crypto/tls.(*Conn).Read+0x1e9                                               /usr/local/go/src/crypto/tls/conn.go:1369
#   0xfd158a    github.com/jackc/pgx/v5/pgconn/internal/bgreader.(*BGReader).Read+0x22a                         /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgconn/internal/bgreader/bgreader.go:100
#   0x4eee67    io.ReadAtLeast+0x167                                                    /usr/local/go/src/io/io.go:335
#   0xf978b8    github.com/jackc/pgx/v5/pgproto3.(*chunkReader).Next+0x6f8                              /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgproto3/chunkreader.go:80
#   0xfa2f0a    github.com/jackc/pgx/v5/pgproto3.(*Frontend).Receive+0x8a                               /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgproto3/frontend.go:220
#   0xfe8cf0    github.com/jackc/pgx/v5/pgconn.(*PgConn).peekMessage+0x2d0                              /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgconn/pgconn.go:524
#   0xfe8f75    github.com/jackc/pgx/v5/pgconn.(*PgConn).receiveMessage+0x55                                /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgconn/pgconn.go:544
#   0xff1319    github.com/jackc/pgx/v5/pgconn.(*MultiResultReader).receiveMessage+0x59                         /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgconn/pgconn.go:1413
#   0xff1e06    github.com/jackc/pgx/v5/pgconn.(*MultiResultReader).Close+0x46                              /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgconn/pgconn.go:1481
#   0xff4a6e    github.com/jackc/pgx/v5/pgconn.(*PgConn).Ping+0x4e                                  /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgconn/pgconn.go:1779
#   0x11b7a04   github.com/jackc/pgx/v5.(*Conn).Ping+0x44                                       /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/conn.go:421
#   0x11d6e2d   github.com/jackc/pgx/v5/pgxpool.(*Pool).Acquire+0x18d                                   /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgxpool/pool.go:511
#   0x11d7b3a   github.com/jackc/pgx/v5/pgxpool.(*Pool).Query+0x9a                                  /home/eigenlabs/go/pkg/mod/github.com/jackc/pgx/v5@v5.5.2/pgxpool/pool.go:603
#   0x1a47abd   github.com/layr-labs/eigenlayer-backend/common/beacon.(*BeaconDataServiceZettablockImpl).GetEigenpodValidators+0x2fd    /home/eigenlabs/repos/eigenlayer-backend/common/beacon/zettablock.go:308
#   0x1c06ed5   github.com/layr-labs/eigenlayer-backend/webserver/services.(*withdrawalProofServiceImpl).GetRestakeProofs+0x355     /home/eigenlabs/repos/eigenlayer-backend/webserver/services/withdrawal_proof_service.go:173

When debugging this, I assumed there was some kind of leak that prevented a connection to be acquired, but that does not seem to be the case since there are idle resources (see below). I confirmed with the DB provider that the connection is in idle state - info obtained from pg_stat_activity.

Pool config and stats:

MaxConns = 20
MinConns = 10

&pgxpool.Stat{
  s: &puddle.Stat{
    constructingResources: 0,
    acquiredResources:     1,
    idleResources:         9,
    maxResources:          20,
    acquireCount:          12,
    acquireDuration:       1224273515,
    emptyAcquireCount:     1,
    canceledAcquireCount:  0,
  },
  newConnsCount:        31,
  lifetimeDestroyCount: 14,
  idleDestroyCount:     0,
}

General info:

I can reproduce this 100% against the DB my company uses; I haven't tested on a local PostgreSQL yet. Could you help on this? Maybe it is in fact a bug. Thank you.

mpjunior92 commented 10 months ago

DB provider pointed me to this AWS doc:

Idle client connection A connection to an Aurora PostgreSQL DB instance is in idle in transaction state and is waiting for a client to send more data or issue a command. This state can lead to an increase in Client:ClientRead events.

davidspiess commented 7 months ago

Have you found a solution? We experience a similar problem.

xakepp35 commented 4 months ago

Have you found a solution? We experience a similar problem.