Closed lrweck closed 3 months ago
You set a timeout of 1ms. That is long enough to send the COMMIT
command to PostgreSQL, but apparently not long a receive the result back from PostgreSQL. So your command is interrupted on the Go side, but PostgreSQL still received the COMMIT
.
that is just reproducer. in production we can see commit taking a few milliseconds sometimes (50, 100 or more, under load). From the clients POV, it looks like we cancel the context but the tx goes through
Regardless of the exact timing, that is still almost certainly what is happening. The COMMIT
is sent, but the command is canceled before the response is received.
Do you suggest not using a context with cancellation/deadline for tx.Commit
? I understand that this might not be a bug, but it can cause problems...
on pgx's side, what happens while commit is running and it receives a cancellation via context? how does it abort an in-progress commit?
on pgx's side, what happens while commit is running and it receives a cancellation via context? how does it abort an in-progress commit?
It sets a deadline on the net.Conn
. This will immediately cause any in-progress network read or writes to fail. pgx will respond the the error by closing the net.Conn
.
But again, if the COMMIT
has already been transmitted over the network there is no way to stop it. (Technically, there is an out of band query cancellation system, and pgx does use it, but the chances of that stopping a COMMIT
are almost zero.)
You can customize the cancellation behavior with pgconn.BuildContextWatcherHandler
, DeadlineContextWatcherHandler
and CancelRequestContextWatcherHandler
. But there is a fundamental issue of once the message is sent it can't be recalled.
You either wait for the response from the server or you cancel without knowledge of whether the command reached the server or not.
even if the commits takes hundreds of milliseconds? anyway, I'll remove the deadline from commit for the time being. This was a fun bug I had to find the hard way
even if the commits takes hundreds of milliseconds?
pgx does send the cancel signal. But again, it is very timing dependent and I don't know how likely successfully cancelling a commit
is at the PostgreSQL level.
You could probably experiment with a couple psql
sessions and pg_cancel_backend
to get more of a feel for how cancellation works.
But ultimately, it's not possible to know for certain what happened on the server without waiting to receive the response from the server. And context cancellation immediately cancels the query without waiting for a response.
Your options are:
Should this example commit, then? this still commits the changes even though I think that it should not?
tx, err := pool.Begin(context.Background())
if err != nil {
slogFatal("failed to start tx", "error", err)
}
_, err = tx.Exec(context.Background(), "INSERT INTO mytable (id) VALUES (1);")
if err != nil {
slogFatal("failed to exec INSERT", "error", err)
}
ctx, cancel := context.WithTimeout(context.Background(), time.Millisecond*100)
defer cancel()
_, err = tx.Exec(ctx, "select pg_sleep(1);commit")
//err = tx.Commit(ctx)
I've read the code carefully and now that I understand more or less what and how it does it, you're right, theres no way to actually guarantee a cancellation in case of context deadline/cancel. for now, making sure the data is consistent is more important to me than performance so I'll remind myself to use context.Background() for commit/rollback operations.
Thanks for your quick replies. pgx is awesome!
Describe the bug
BeginTx, Exec("INSERT sql"), Commit(ctx).
I seeing a behavior that commit still commits work done even though it returns "deadline exceeded".
To Reproduce Steps to reproduce the behavior:
If possible, please provide runnable example such as:
Please run your example with the race detector enabled. For example,
go run -race main.go
orgo test -race
.Expected behavior execute rollback on failed commit, no changes persisted
Actual behavior changes done in transaction are persisted
Version