jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.86k stars 846 forks source link

Transaction is committed when ctx times out #2110

Closed lrweck closed 3 months ago

lrweck commented 3 months ago

Describe the bug

BeginTx, Exec("INSERT sql"), Commit(ctx).

I seeing a behavior that commit still commits work done even though it returns "deadline exceeded".

To Reproduce Steps to reproduce the behavior:

If possible, please provide runnable example such as:

package main

import (
    "context"
    "log"
    "os"

    "github.com/jackc/pgx/v5"
)

func main() {
    conn, err := pgx.Connect(context.Background(), os.Getenv("DATABASE_URL"))
    if err != nil {
        log.Fatal(err)
    }
    defer conn.Close(context.Background())

    tx, _ := conn.BeginTx(context.Background())

        _ = tx.Exec(context.Background,"insert into mytable(id) values ($1);", 1)

        ctx,cancel := context.WithTimeout(context.Background(), time.Millisecond)
        defer cancel()

        // returns Deadline exceeded, but insert is persisted
        err := tx.Commit(ctx)

}

Please run your example with the race detector enabled. For example, go run -race main.go or go test -race.

Expected behavior execute rollback on failed commit, no changes persisted

Actual behavior changes done in transaction are persisted

Version

jackc commented 3 months ago

You set a timeout of 1ms. That is long enough to send the COMMIT command to PostgreSQL, but apparently not long a receive the result back from PostgreSQL. So your command is interrupted on the Go side, but PostgreSQL still received the COMMIT.

lrweck commented 3 months ago

that is just reproducer. in production we can see commit taking a few milliseconds sometimes (50, 100 or more, under load). From the clients POV, it looks like we cancel the context but the tx goes through

jackc commented 3 months ago

Regardless of the exact timing, that is still almost certainly what is happening. The COMMIT is sent, but the command is canceled before the response is received.

lrweck commented 3 months ago

Do you suggest not using a context with cancellation/deadline for tx.Commit ? I understand that this might not be a bug, but it can cause problems...

lrweck commented 3 months ago

on pgx's side, what happens while commit is running and it receives a cancellation via context? how does it abort an in-progress commit?

jackc commented 3 months ago

on pgx's side, what happens while commit is running and it receives a cancellation via context? how does it abort an in-progress commit?

It sets a deadline on the net.Conn. This will immediately cause any in-progress network read or writes to fail. pgx will respond the the error by closing the net.Conn.

But again, if the COMMIT has already been transmitted over the network there is no way to stop it. (Technically, there is an out of band query cancellation system, and pgx does use it, but the chances of that stopping a COMMIT are almost zero.)

jackc commented 3 months ago

You can customize the cancellation behavior with pgconn.BuildContextWatcherHandler, DeadlineContextWatcherHandler and CancelRequestContextWatcherHandler. But there is a fundamental issue of once the message is sent it can't be recalled.

You either wait for the response from the server or you cancel without knowledge of whether the command reached the server or not.

lrweck commented 3 months ago

even if the commits takes hundreds of milliseconds? anyway, I'll remove the deadline from commit for the time being. This was a fun bug I had to find the hard way

jackc commented 3 months ago

even if the commits takes hundreds of milliseconds?

pgx does send the cancel signal. But again, it is very timing dependent and I don't know how likely successfully cancelling a commit is at the PostgreSQL level.

You could probably experiment with a couple psql sessions and pg_cancel_backend to get more of a feel for how cancellation works.

But ultimately, it's not possible to know for certain what happened on the server without waiting to receive the response from the server. And context cancellation immediately cancels the query without waiting for a response.

Your options are:

  1. Remove the deadline altogether
  2. Customize the context cancellation behavior such that it requests the cancellation, but waits a while before severing the connection.
  3. Query the database after the interrupted commit to see if the change was applied.
lrweck commented 3 months ago

Should this example commit, then? this still commits the changes even though I think that it should not?

    tx, err := pool.Begin(context.Background())
    if err != nil {
        slogFatal("failed to start tx", "error", err)
    }

    _, err = tx.Exec(context.Background(), "INSERT INTO mytable (id) VALUES (1);")
    if err != nil {
        slogFatal("failed to exec INSERT", "error", err)
    }

    ctx, cancel := context.WithTimeout(context.Background(), time.Millisecond*100)
    defer cancel()

    _, err = tx.Exec(ctx, "select pg_sleep(1);commit")
    //err = tx.Commit(ctx)
lrweck commented 3 months ago

I've read the code carefully and now that I understand more or less what and how it does it, you're right, theres no way to actually guarantee a cancellation in case of context deadline/cancel. for now, making sure the data is consistent is more important to me than performance so I'll remind myself to use context.Background() for commit/rollback operations.

Thanks for your quick replies. pgx is awesome!