mattn / go-sqlite3

sqlite3 driver for go using database/sql
http://mattn.github.io/go-sqlite3
MIT License
8.02k stars 1.11k forks source link

Error "cannot start a transaction within a transaction" on cached connection #764

Open azavorotnii opened 4 years ago

azavorotnii commented 4 years ago

Assuming we run this piece of code:

            ctx, cancel := context.WithCancel(ctx)
            tx, err := db.BeginTx(ctx, nil)

it is possible to get at the same time context.Cancelled error and started transaction. This will happen only if we cancel context very fast.

This happens because context cancellation is inherently racy as performed not in same go-routine where statement is executed.

Proposed fix #765

rittneje commented 4 years ago

@azavorotnii If I understand your bug report correctly, this is not a bug with BeginTx specifically, but rather some faulty logic in (*SQLiteStmt).exec. https://github.com/mattn/go-sqlite3/blob/590d44c02bca83987d23f6eab75e6d0ddf95f644/sqlite3.go#L1893-L1905

The existing code erroneously concludes that if we called sqlite3_interrupt, then the pending operation will definitely fail so it doesn't bother looking at the result. This is flawed (1) because of the inherent raciness, and (2) because the SQLite docs themselves say that sqlite3_interrupt will not necessarily interrupt a pending operation. "If an SQL operation is very nearly finished at the time when sqlite3_interrupt() is called, then it might not have an opportunity to be interrupted and might continue to completion." https://www.sqlite.org/c3ref/interrupt.html

I believe the correct fix would be to not just discard what gets pulled from resultCh and instead only return ctx.Err() if we receive SQLITE_INTERRUPT from the actual operation.

I have also sent a question to the SQLite mailing list about the behavior when BEGIN gets interrupted to see if we need to explicitly roll back in that case or not.

ETA: The SQLite maintainers say that if you interrupt a call to BEGIN, the transaction is aborted. So calling ROLLBACK in this case is unnecessary (though harmless).

azavorotnii commented 4 years ago

I believe the correct fix would be to not just discard what gets pulled from resultCh and instead only return ctx.Err() if we receive SQLITE_INTERRUPT from the actual operation.

@rittneje that was my initial thought, but for some reason instead of SQLITE_INTERRUPT I got SQLITE_DONE. That could be race in my test setup though (not same test as in this PR). Let me check again.

azavorotnii commented 4 years ago

@rittneje done as you proposed. I did mistake in my initial test, now it should be stable.