lib / pq

Pure Go Postgres driver for database/sql
https://pkg.go.dev/github.com/lib/pq
MIT License
8.89k stars 906 forks source link

Digital Ocean Managed Postgre Primary Failover #1026

Closed bjg2 closed 4 months ago

bjg2 commented 3 years ago

Hey everyone!

In my org, we're using Digital Ocean Managed Postgre, and we're having a database setup of 1 primary and 2 standbys. Digital Ocean does not provide hosts for all 3 nodes, just one host that always points to the primary node (something like XXX.a.db.ondigitalocean.com). Sometimes, by design, primary failover happens, so that primary node becomes secondary and some secondary becomes primary.

Problem is, when that happens, connections seem to remain open with the node that was primary and just became secondary. Those connections now forever start erroring out with write tcp YYY->ZZZ: write: connection timed out, and never recover.

Mitigation I came up with was db.SetConnMaxLifetime(time.Minute), but that's not ideal. Is there any better way around this problem at the moment?

PS: I saw a similar issue here https://github.com/lib/pq/issues/683, but I don't think it applies to our problem, as we do not have multiple hosts provided, just one host string, and that host points to the current primary.

Lekensteyn commented 3 years ago

Possibly a duplicate of #835.

bjg2 commented 2 years ago

We need to find a way to mitigate this better, as we noticed that conn lifetime has really bad perf implications. Should we expect that this issue is fixed by https://github.com/lib/pq/pull/1013 ? It is very hard to test this, as DO does not provide an interface for triggering primary failover...

zak905 commented 4 months ago

any updates ? @bjg2 I am interested to know if you have found new ways to handle this issue ?

bjg2 commented 4 months ago

I think we did not have that issue for a long time, I guess pq patched the issue with linked fix above.