google / trillian

A transparent, highly scalable and cryptographically verifiable data store.
Apache License 2.0
3.55k stars 379 forks source link

TX rollback error: sql: transaction has already been committed or rolled back #2998

Open hectorj2f opened 1 year ago

hectorj2f commented 1 year ago

We're seeing many errors in our Trillian instance with calls to GetEntryAndProof and GetLatestSignedLogRoot. Does anyone have an idea why we're seeing these warnings ? We found these warnings during a period of time under high load.

W0511 21:49:14.520050       1 tree_storage.go:381] Rollback error on Close(): invalid connection
W0511 21:49:14.520057       1 log_rpc_server.go:541] 2605736670972794746: Close failed for GetEntryAndProof: invalid connection
W0511 21:49:14.520125       1 tree_storage.go:367] TX rollback error: invalid connection, stack:
goroutine 97837345 [running]:
runtime/debug.Stack()
    runtime/debug/stack.go:24 +0x65
github.com/google/trillian/storage/mysql.(*treeTX).rollbackInternal(0xf?)
    github.com/google/trillian@v1.4.1/storage/mysql/tree_storage.go:367 +0x3f
github.com/google/trillian/storage/mysql.(*treeTX).Close(0xc799bc?)
    github.com/google/trillian@v1.4.1/storage/mysql/tree_storage.go:379 +0xae
github.com/google/trillian/server.(*TrillianLogRPCServer).closeAndLog(0x118d0e0?, {0xc0012d33e0?, 0xf67ac8?}, 0x1e?, {0x118f300?, 0xc00cadc580?}, {0xf92737, 0x10})
    github.com/google/trillian@v1.4.1/server/log_rpc_server.go:539 +0x54
github.com/google/trillian/server.(*TrillianLogRPCServer).GetEntryAndProof(0xc000260320, {0x118d0e0?, 0xc0012d32f0?}, 0xc003807c20)
    github.com/google/trillian@v1.4.1/server/log_rpc_server.go:509 +0x6b9
github.com/google/trillian._TrillianLog_GetEntryAndProof_Handler.func1({0x118d0e0, 0xc0012d32f0}, {0xefe3c0?, 0xc003807c20})
    github.com/google/trillian@v1.4.1/trillian_log_api_grpc.pb.go:356 +0x78
github.com/google/trillian/server/interceptor.(*TrillianInterceptor).UnaryInterceptor(0xc0002d0930, {0x118d0e0, 0xc00037ac00}, {0xefe3c0, 0xc003807c20}, 0xc0114a2520, 0xc005828798)
    github.com/google/trillian@v1.4.1/server/interceptor/interceptor.go:138 +0x10c
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc00037ac00?}, {0xefe3c0?, 0xc003807c20?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/google/trillian/server/interceptor.ErrorWrapper({0x118d0e0?, 0xc00037a5a0?}, {0xefe3c0, 0xc003807c20}, 0xc0008e18e8?, 0xc0114a2540)
    github.com/google/trillian@v1.4.1/server/interceptor/interceptor.go:473 +0x8f
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc00037a5a0?}, {0xefe3c0?, 0xc003807c20?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/google/trillian/monitoring.(*RPCStatsInterceptor).Interceptor.func1({0x118d0e0, 0xc003bd3fb0}, {0xefe3c0, 0xc003807c20}, 0xc0114a2520, 0xc0114a2560)
    github.com/google/trillian@v1.4.1/monitoring/rpc_stats_interceptor.go:92 +0x215
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc003bd3fb0?}, {0xefe3c0?, 0xc003807c20?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x118d0e0, 0xc003bd3fb0}, {0xefe3c0, 0xc003807c20}, 0xc0008e1af0?, 0xe38b80?)
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0xbf
github.com/google/trillian._TrillianLog_GetEntryAndProof_Handler({0xf49160?, 0xc000260320}, {0x118d0e0, 0xc003bd3fb0}, 0xc0011ebb00, 0xc0002d0960)
    github.com/google/trillian@v1.4.1/trillian_log_api_grpc.pb.go:358 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003a81c0, {0x11913a0, 0xc0009964e0}, 0xc0007b17a0, 0xc0002d0c90, 0x180fbf8, 0x0)
    google.golang.org/grpc@v1.46.0/server.go:1283 +0xcfd
google.golang.org/grpc.(*Server).handleStream(0xc0003a81c0, {0x11913a0, 0xc0009964e0}, 0xc0007b17a0, 0x0)
    google.golang.org/grpc@v1.46.0/server.go:1620 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
    google.golang.org/grpc@v1.46.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
    google.golang.org/grpc@v1.46.0/server.go:920 +0x28a
W0511 21:49:14.520139       1 tree_storage.go:381] Rollback error on Close(): invalid connection
W0511 21:49:14.520145       1 log_rpc_server.go:541] 2605736670972794746: Close failed for GetEntryAndProof: invalid connection
W0511 21:49:14.520548       1 tree_storage.go:358] TX commit error: sql: transaction has already been committed or rolled back, stack:
goroutine 97837634 [running]:
runtime/debug.Stack()
    runtime/debug/stack.go:24 +0x65
github.com/google/trillian/storage/mysql.(*treeTX).Commit(0xc00c5e8b00, {0x118d0e0, 0xc011e0d980})
    github.com/google/trillian@v1.4.1/storage/mysql/tree_storage.go:358 +0x27f
github.com/google/trillian/server.(*TrillianLogRPCServer).GetEntryAndProof(0xc000260320, {0x118d0e0?, 0xc011e0d890?}, 0xc003289db0)
    github.com/google/trillian@v1.4.1/server/log_rpc_server.go:523 +0x56d
github.com/google/trillian._TrillianLog_GetEntryAndProof_Handler.func1({0x118d0e0, 0xc011e0d890}, {0xefe3c0?, 0xc003289db0})
    github.com/google/trillian@v1.4.1/trillian_log_api_grpc.pb.go:356 +0x78
github.com/google/trillian/server/interceptor.(*TrillianInterceptor).UnaryInterceptor(0xc0002d0930, {0x118d0e0, 0xc011e0d530}, {0xefe3c0, 0xc003289db0}, 0xc0046e3e80, 0xc000a249f0)
    github.com/google/trillian@v1.4.1/server/interceptor/interceptor.go:138 +0x10c
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc011e0d530?}, {0xefe3c0?, 0xc003289db0?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/google/trillian/server/interceptor.ErrorWrapper({0x118d0e0?, 0xc011e0d4d0?}, {0xefe3c0, 0xc003289db0}, 0xc0008e38e8?, 0xc0046e3ea0)
    github.com/google/trillian@v1.4.1/server/interceptor/interceptor.go:473 +0x8f
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc011e0d4d0?}, {0xefe3c0?, 0xc003289db0?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/google/trillian/monitoring.(*RPCStatsInterceptor).Interceptor.func1({0x118d0e0, 0xc011e0d4a0}, {0xefe3c0, 0xc003289db0}, 0xc0046e3e80, 0xc0046e3ec0)
    github.com/google/trillian@v1.4.1/monitoring/rpc_stats_interceptor.go:92 +0x215
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc011e0d4a0?}, {0xefe3c0?, 0xc003289db0?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x118d0e0, 0xc011e0d4a0}, {0xefe3c0, 0xc003289db0}, 0xc0008e3af0?, 0xe38b80?)
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0xbf
github.com/google/trillian._TrillianLog_GetEntryAndProof_Handler({0xf49160?, 0xc000260320}, {0x118d0e0, 0xc011e0d4a0}, 0xc0006ba900, 0xc0002d0960)
    github.com/google/trillian@v1.4.1/trillian_log_api_grpc.pb.go:358 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003a81c0, {0x11913a0, 0xc000996340}, 0xc000848d80, 0xc0002d0c90, 0x180fbf8, 0x0)
    google.golang.org/grpc@v1.46.0/server.go:1283 +0xcfd
google.golang.org/grpc.(*Server).handleStream(0xc0003a81c0, {0x11913a0, 0xc000996340}, 0xc000848d80, 0x0)
    google.golang.org/grpc@v1.46.0/server.go:1620 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
    google.golang.org/grpc@v1.46.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
    google.golang.org/grpc@v1.46.0/server.go:920 +0x28a
W0511 21:49:14.525477       1 tree_storage.go:358] TX commit error: sql: transaction has already been committed or rolled back, stack:
goroutine 97837374 [running]:
runtime/debug.Stack()
    runtime/debug/stack.go:24 +0x65
github.com/google/trillian/storage/mysql.(*treeTX).Commit(0xc00c5e8bb0, {0x118d0e0, 0xc004702db0})
    github.com/google/trillian@v1.4.1/storage/mysql/tree_storage.go:358 +0x27f
github.com/google/trillian/server.(*TrillianLogRPCServer).commitAndLog(0xc004f03650?, {0x118d0e0?, 0xc004702db0?}, 0x40?, {0x118f300?, 0xc00c5e8bb0?}, {0xf992a5, 0x16})
    github.com/google/trillian@v1.4.1/server/log_rpc_server.go:531 +0x64
github.com/google/trillian/server.(*TrillianLogRPCServer).GetLatestSignedLogRoot(0xc000260320, {0x118d0e0?, 0xc004702cc0?}, 0xc002786080)
    github.com/google/trillian@v1.4.1/server/log_rpc_server.go:374 +0x3e6
github.com/google/trillian._TrillianLog_GetLatestSignedLogRoot_Handler.func1({0x118d0e0, 0xc004702cc0}, {0xef3f40?, 0xc002786080})
    github.com/google/trillian@v1.4.1/trillian_log_api_grpc.pb.go:338 +0x78
github.com/google/trillian/server/interceptor.(*TrillianInterceptor).UnaryInterceptor(0xc0002d0930, {0x118d0e0, 0xc00159bad0}, {0xef3f40, 0xc002786080}, 0xc0046e31a0, 0xc003c78588)
    github.com/google/trillian@v1.4.1/server/interceptor/interceptor.go:138 +0x10c
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc00159bad0?}, {0xef3f40?, 0xc002786080?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/google/trillian/server/interceptor.ErrorWrapper({0x118d0e0?, 0xc00159b9e0?}, {0xef3f40, 0xc002786080}, 0xc0008e38e8?, 0xc0046e31c0)
    github.com/google/trillian@v1.4.1/server/interceptor/interceptor.go:473 +0x8f
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc00159b9e0?}, {0xef3f40?, 0xc002786080?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/google/trillian/monitoring.(*RPCStatsInterceptor).Interceptor.func1({0x118d0e0, 0xc00159b980}, {0xef3f40, 0xc002786080}, 0xc0046e31a0, 0xc0046e31e0)
    github.com/google/trillian@v1.4.1/monitoring/rpc_stats_interceptor.go:92 +0x215
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc00159b980?}, {0xef3f40?, 0xc002786080?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x118d0e0, 0xc00159b980}, {0xef3f40, 0xc002786080}, 0xc0008e3af0?, 0xe38b80?)
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0xbf
github.com/google/trillian._TrillianLog_GetLatestSignedLogRoot_Handler({0xf49160?, 0xc000260320}, {0x118d0e0, 0xc00159b980}, 0xc0163707e0, 0xc0002d0960)
    github.com/google/trillian@v1.4.1/trillian_log_api_grpc.pb.go:340 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003a81c0, {0x11913a0, 0xc000996340}, 0xc0010bb0e0, 0xc0002d0c90, 0x180fbe0, 0x0)
    google.golang.org/grpc@v1.46.0/server.go:1283 +0xcfd
google.golang.org/grpc.(*Server).handleStream(0xc0003a81c0, {0x11913a0, 0xc000996340}, 0xc0010bb0e0, 0x0)
    google.golang.org/grpc@v1.46.0/server.go:1620 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
    google.golang.org/grpc@v1.46.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
    google.golang.org/grpc@v1.46.0/server.go:920 +0x28a
W0511 21:49:14.525503       1 log_rpc_server.go:533] 2605736670972794746: Commit failed for GetLatestSignedLogRoot: sql: transaction has already been committed or rolled back
W0511 21:49:14.537568       1 tree_storage.go:203] Failed to get merkle subtrees: context canceled
W0511 21:49:14.537760       1 tree_storage.go:367] TX rollback error: sql: transaction has already been committed or rolled back, stack:
goroutine 97837276 [running]:
runtime/debug.Stack()
    runtime/debug/stack.go:24 +0x65
github.com/google/trillian/storage/mysql.(*treeTX).rollbackInternal(0x20?)
    github.com/google/trillian@v1.4.1/storage/mysql/tree_storage.go:367 +0x3f
github.com/google/trillian/storage/mysql.(*treeTX).Close(0xc799bc?)
    github.com/google/trillian@v1.4.1/storage/mysql/tree_storage.go:379 +0xae
github.com/google/trillian/server.(*TrillianLogRPCServer).closeAndLog(0x118d348?, {0xc0002b4c08?, 0xc011aabb10?}, 0x20?, {0x118f300?, 0xc011512b00?}, {0xf92737, 0x10})
    github.com/google/trillian@v1.4.1/server/log_rpc_server.go:539 +0x54
github.com/google/trillian/server.(*TrillianLogRPCServer).GetEntryAndProof(0xc000260320, {0x118d0e0?, 0xc000b0a240?}, 0xc0034d7f90)
    github.com/google/trillian@v1.4.1/server/log_rpc_server.go:503 +0x705
github.com/google/trillian._TrillianLog_GetEntryAndProof_Handler.func1({0x118d0e0, 0xc000b0a240}, {0xefe3c0?, 0xc0034d7f90})
    github.com/google/trillian@v1.4.1/trillian_log_api_grpc.pb.go:356 +0x78
github.com/google/trillian/server/interceptor.(*TrillianInterceptor).UnaryInterceptor(0xc0002d0930, {0x118d0e0, 0xc0015ee480}, {0xefe3c0, 0xc0034d7f90}, 0xc0036e8780, 0xc002b0e060)
    github.com/google/trillian@v1.4.1/server/interceptor/interceptor.go:138 +0x10c
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc0015ee480?}, {0xefe3c0?, 0xc0034d7f90?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/google/trillian/server/interceptor.ErrorWrapper({0x118d0e0?, 0xc0015ee420?}, {0xefe3c0, 0xc0034d7f90}, 0xc0002d98e8?, 0xc0036e87a0)
    github.com/google/trillian@v1.4.1/server/interceptor/interceptor.go:473 +0x8f
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc0015ee420?}, {0xefe3c0?, 0xc0034d7f90?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/google/trillian/monitoring.(*RPCStatsInterceptor).Interceptor.func1({0x118d0e0, 0xc0015ee390}, {0xefe3c0, 0xc0034d7f90}, 0xc0036e8780, 0xc0036e87c0)
    github.com/google/trillian@v1.4.1/monitoring/rpc_stats_interceptor.go:92 +0x215
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x118d0e0?, 0xc0015ee390?}, {0xefe3c0?, 0xc0034d7f90?})
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x118d0e0, 0xc0015ee390}, {0xefe3c0, 0xc0034d7f90}, 0xc0002d9af0?, 0xe38b80?)
    github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0xbf
github.com/google/trillian._TrillianLog_GetEntryAndProof_Handler({0xf49160?, 0xc000260320}, {0x118d0e0, 0xc0015ee390}, 0xc001085380, 0xc0002d0960)
    github.com/google/trillian@v1.4.1/trillian_log_api_grpc.pb.go:358 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003a81c0, {0x11913a0, 0xc000996340}, 0xc0022dcd80, 0xc0002d0c90, 0x180fbf8, 0x0)
    google.golang.org/grpc@v1.46.0/server.go:1283 +0xcfd
google.golang.org/grpc.(*Server).handleStream(0xc0003a81c0, {0x11913a0, 0xc000996340}, 0xc0022dcd80, 0x0)
    google.golang.org/grpc@v1.46.0/server.go:1620 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
    google.golang.org/grpc@v1.46.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
    google.golang.org/grpc@v1.46.0/server.go:920 +0x28a
W0511 21:49:14.537779       1 tree_storage.go:381] Rollback error on Close(): sql: transaction has already been committed or rolled back
W0511 21:49:14.537787       1 log_rpc_server.go:541] 2605736670972794746: Close failed for GetEntryAndProof: sql: transaction has already been committed or rolled back
W0511 21:49:14.627855       1 log_storage.go:601] Failed to get leaves by range: context canceled
W0511 21:49:14.628380       1 tree_storage.go:358] TX commit error: sql: transaction has already been committed or rolled back, stack:
hectorj2f commented 1 year ago

Is there any way to increase the verbosity of these errors ? So we can troubleshoot them ourselves. I checked our deployment and saw that --tracing=true is enabled.

mhutchinson commented 10 months ago

Hi @hectorj2f I'm triaging bugs and came across this. Apologies that nobody has replied to you earlier!

Are you still seeing this? Looking at the call paths, I think #3201 will make a big difference here. Removing the inner join in the SQL query in this codepath makes serving proofs significantly cheaper. If it is an option for you, I recommend migrating to a new trillian tree that is configured to be revisionless. We will cut a release next week which will include these changes, and will create trees as revisionless by default. In the meantime, you can test this by using the @master version.