lib / pq

Pure Go Postgres driver for database/sql
https://pkg.go.dev/github.com/lib/pq
MIT License
8.89k stars 906 forks source link

drivers.ErrBadConn returned for DB.QueryRowContext.Scan when context is cancelled #1062

Closed mjl- closed 2 years ago

mjl- commented 2 years ago

I have been seeing unexpected "driver: bad connection" (driver.ErrBadConn) errors in my logging. In my case, these are returned by calls to database/sql's DB.QueryRowContext.Scan (and likely also on Tx's) when their context is cancelled. Cancelling contexts happens for me in practice because I pass contexts from http.Request's on to lib/pq. I believe DB.QueryRowContext.Scan should not be made to return driver.ErrBadConn when contexts are cancelled.

I'll add a small reproducer after I've created the issue.

I looks the following ordered events take place in lib/pq:

I would expect the Scan on the sql Row to return context.Canceled in this case.

Perhaps lib/pq's connections could start to keep track not just whether connections are bad, but also the error to return to future calls on the connection. For expired contexts, the errors would become eg context.Canceled or context.DeadlineExceeded.

lunemec commented 2 years ago

We are facing similar issue, cancelling the context by http server when user cancels the request triggers driver: bad connection to be raised. It seems that in the end it causes sql package to re-connect over and over again, until for some reason the connection pool is exhausted (that is most likely caused by different part of the code, maybe even stdlib or our own).

mjl- commented 2 years ago

FWIW, I haven't seen unexpected connection pool exhaustion or problematic reconnects.

The workaround I put in my code is to check for the database/sql/driver.ErrBadConn explicitly and treating it similarly to how I treat context.Canceled errors: marking it a user error, not a server error. That saves me getting alerted.

fho commented 2 years ago

This issue was solved in release 1.10.4 via via https://github.com/lib/pq/pull/1064 ?

otan commented 2 years ago

Should be!

qburst-sreeraj commented 2 years ago

@otan @mjl- We still have this issue.

mjl- commented 2 years ago

I remember from last looking at the code thinking that lib/pq may not be adhering fully to the requirements set by database/sql about when it returns ErrBadConn. You could investigate in that direction...