lib / pq

Pure Go Postgres driver for database/sql
https://pkg.go.dev/github.com/lib/pq
MIT License
8.98k stars 909 forks source link

Fix query cancellation collateralizing future queries using the same connection #1000

Closed kahuang closed 3 years ago

kahuang commented 3 years ago

This change solves the following issue we were seeing in production:

  1. A context would get cancelled due to timeout
  2. watchCancel would get triggered, and start to send the cancellation
  3. Because the cancellation happens concurrently with the normal query path, and doesn't block the finish() path until its done cancelling the query on the postgres side, this connection could be returned to the pool before getting cancelled
  4. This connection then gets reused, and gets collateralized by the previous query's cancel request

The changes I've made to ensure we don't run into this collateral damage:

  1. When the context is cancelled, we immediately try to send the finished signal to the finished channel to prevent a race. Either we successfully send this, or the query we tried to cancel finished first. In the latter case, we just return, and let a future query (if one happens) trigger the cancellation.
  2. In the former case we set the connection state to bad so it doesn't get reused, and cancel the query.
  3. When the finish() function gets called, the querier goroutine closes the connection.

Note: We probably don't need to change bad to an atomic variable, but I did just to be safe.

kahuang commented 3 years ago

I've updated the tests and squashed the commits

maddyblue commented 3 years ago

Need to remove go 1.13 which doesn't support some of the sync stuff here. Doing that in #1014.

jwatte commented 3 years ago

This change causes a public API change that breaks our app. We start a transaction in a WithCancel() context, run a select, cancel the context, and run another select. The error returned when transacting on a canceled context used to be: "sql: transaction has already been committed or rolled back" It is now instead "driver: bad connection" which is not just a breaking change, but arguably the wrong error.

maddyblue commented 3 years ago

If you submit a PR that reverts this change and adds a test to prevent this regression in the future I'll merge it.

NOMORECOFFEE commented 3 years ago

Hello. It's possible that connections remain in the pool in the ErrBadConn state and a connection drop out in the next request. Should we implement the SessionResetter in this case?

knz commented 2 years ago

This change causes a public API change that breaks our app.

We've just run into the same issue inside CockroachDB. A query cancellation should not drop the connection.

rafiss commented 2 years ago

I think https://github.com/lib/pq/pull/1079 will address the issues with this PR

evanj commented 1 year ago

I have updated #1079 with the code review comments in case it helps.