Closed Lily2025 closed 3 weeks ago
/type bug /severity major /assign lhy1024
After #8744, we support dynamic check tso service, which make startTSOJobsIfNeeded
happen after stopTSOJobsIfNeeded
[2024/11/04 10:32:44.337 +08:00] [WARN] [tso.go:371] ["save timestamp failed"] [] [dc-location=global] [timestamp-path=timestamp] [error="context deadline exceeded"]
[2024/11/04 10:32:44.337 +08:00] [WARN] [allocator_manager.go:781] ["failed to update allocator's timestamp"] [] [dc-location=global] [name=tc-pd-2] [error="context deadline exceeded"]
[2024/11/04 10:32:44.337 +08:00] [INFO] [tso.go:437] ["reset the timestamp in memory"] []
stopTSOJobsIfNeeded
[2024/11/04 10:32:44.339 +08:00] [WARN] [lease.go:185] ["lease keep alive failed"] [purpose="leader election"] [start=2024/11/04 10:32:43.758 +08:00] [error="context canceled"]
[2024/11/04 10:32:44.339 +08:00] [INFO] [lease.go:202] ["stop lease keep alive worker"] [purpose="leader election"]
[2024/11/04 10:32:44.377 +08:00] [INFO] [server.go:1777] ["no longer a leader because lease has expired, pd leader will step down"]
checkTSOService
call startTSOJobsIfNeeded
, although it was not leader
[2024/11/04 10:32:49.350 +08:00] [INFO] [cluster.go:476] ["initializing the global TSO allocator"]
[2024/11/04 10:32:49.352 +08:00] [INFO] [tso.go:161] ["start to sync timestamp"] []
[2024/11/04 10:32:49.356 +08:00] [INFO] [tso.go:221] ["sync and save timestamp"] [] [last=2024/11/04 10:32:37.334 +08:00] [last-saved=0001/01/01 00:00:00.000 +00:00] [save=2024/11/04 10:32:52.355 +08:00] [next=2024/11/04 10:32:49.355 +08:00]
[2024/11/04 10:32:49.359 +08:00] [INFO] [cluster.go:459] ["service check job is stopped"]
[2024/11/04 10:33:19.423 +08:00] [INFO] [cluster.go:862] ["raft cluster is stopped"]
[2024/11/04 10:53:00.806 +08:00] [INFO] [server.go:1768] ["PD leader is ready to serve"] [leader-name=tc-pd-2]
....
[2024/11/04 10:53:00.834 +08:00] [WARN] [tso.go:335] ["clock offset"] [] [jet-lag=5m0.968119947s] [prev-physical=2024/11/04 10:47:59.866 +08:00] [now=2024/11/04 10:53:00.834 +08:00] [update-physical-interval=50ms]
[2024/11/04 10:53:00.821 +08:00] [PANIC] [tso_dispatcher.go:647] ["[tso] timestamp fallback"] [dc-location=global] [keyspace=4294967295] [last-ts="(1730688755816, 13414)"] [cur-ts="(1730688479866, 1)"] [last-tso-server=tc-pd-1.tc-pd-peer.endless-ha-test-add-index-tps-7651740-1-143.svc:2379] [cur-tso-server=tc-pd-2.tc-pd-peer.endless-ha-test-add-index-tps-7651740-1-143.svc:2379] [last-keyspace-group-in-request=0] [cur-keyspace-group-in-request=0] [last-keyspace-group-in-response=0] [cur-keyspace-group-in-response=0] [last-response-received-at=2024/11/04 10:52:40.175 +08:00] [cur-response-received-at=2024/11/04 10:53:00.821 +08:00] [last-stream-id=tc-pd-1.tc-pd-peer.endless-ha-test-add-index-tps-7651740-1-143.svc:2379-274] [cur-stream-id=tc-pd-2.tc-pd-peer.endless-ha-test-add-index-tps-7651740-1-143.svc:2379-522]
[2024/11/04 10:53:00.821 +08:00] [FATAL] [tso_stream.go:330] ["tsoStream.recvLoop internal panic"] [stacktrace="github.com/tikv/pd/client.(*tsoStream).recvLoop.func1\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_stream.go:330\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:785\ngo.uber.org/zap/zapcore.CheckWriteAction.OnWrite\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:196\ngo.uber.org/zap/zapcore.(*CheckedEntry).Write\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:262\ngo.uber.org/zap.(*Logger).Panic\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/logger.go:285\ngithub.com/pingcap/log.Panic\n\t/root/go/pkg/mod/github.com/pingcap/log@v1.1.1-0.20240314023424-862ccc32f18d/global.go:54\ngithub.com/tikv/pd/client.(*tsoDispatcher).checkMonotonicity\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_dispatcher.go:647\ngithub.com/tikv/pd/client.(*tsoDispatcher).processRequests.func2\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_dispatcher.go:597\ngithub.com/tikv/pd/client.(*tsoStream).recvLoop\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_stream.go:444"] [panicMessage="[tso] timestamp fallback"] [stack="github.com/tikv/pd/client.(*tsoStream).recvLoop.func1\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_stream.go:330\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:785\ngo.uber.org/zap/zapcore.CheckWriteAction.OnWrite\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:196\ngo.uber.org/zap/zapcore.(*CheckedEntry).Write\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:262\ngo.uber.org/zap.(*Logger).Panic\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/logger.go:285\ngithub.com/pingcap/log.Panic\n\t/root/go/pkg/mod/github.com/pingcap/log@v1.1.1-0.20240314023424-862ccc32f18d/global.go:54\ngithub.com/tikv/pd/client.(*tsoDispatcher).checkMonotonicity\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_dispatcher.go:647\ngithub.com/tikv/pd/client.(*tsoDispatcher).processRequests.func2\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_dispatcher.go:597\ngithub.com/tikv/pd/client.(*tsoStream).recvLoop\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_stream.go:444"]
last-ts 1730688479866 is 2024-11-04 10:52:35.816 and cur-ts 1730688479866 is 2024-11-04 10:47:59.866, which is the same with pd-server log
Supplementary code screenshots for better understanding.
checkTSOService
, at this point ctx is not yet Done
.
RaftCluster
called Stop
and canceled the context.
Stop
called StopTSOJob
.checkTSOService
called StartTSOJob
,Caused TSO to not shut down properly.
Bug Report
What did you do?
1、run sysbench 2、add index for one table 3、injection pd leader io delay tidb-2024-11-04T11-37-04.520.log.zip
What did you expect to see?
no panic
What did you see instead?
tidb abnormally restart due to internal panic [2024/11/04 10:53:00.821 +08:00] [FATAL] [tso_stream.go:330] ["tsoStream.recvLoop internal panic"] [stacktrace="github.com/tikv/pd/client.(tsoStream).recvLoop.func1\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_stream.go:330\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:785\ngo.uber.org/zap/zapcore.CheckWriteAction.OnWrite\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:196\ngo.uber.org/zap/zapcore.(CheckedEntry).Write\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:262\ngo.uber.org/zap.(Logger).Panic\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/logger.go:285\ngithub.com/pingcap/log.Panic\n\t/root/go/pkg/mod/github.com/pingcap/log@v1.1.1-0.20240314023424-862ccc32f18d/global.go:54\ngithub.com/tikv/pd/client.(tsoDispatcher).checkMonotonicity\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_dispatcher.go:647\ngithub.com/tikv/pd/client.(tsoDispatcher).processRequests.func2\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_dispatcher.go:597\ngithub.com/tikv/pd/client.(tsoStream).recvLoop\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_stream.go:444"] [panicMessage="[tso] timestamp fallback"] [stack="github.com/tikv/pd/client.(tsoStream).recvLoop.func1\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_stream.go:330\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:785\ngo.uber.org/zap/zapcore.CheckWriteAction.OnWrite\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:196\ngo.uber.org/zap/zapcore.(CheckedEntry).Write\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:262\ngo.uber.org/zap.(Logger).Panic\n\t/root/go/pkg/mod/go.uber.org/zap@v1.27.0/logger.go:285\ngithub.com/pingcap/log.Panic\n\t/root/go/pkg/mod/github.com/pingcap/log@v1.1.1-0.20240314023424-862ccc32f18d/global.go:54\ngithub.com/tikv/pd/client.(tsoDispatcher).checkMonotonicity\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_dispatcher.go:647\ngithub.com/tikv/pd/client.(tsoDispatcher).processRequests.func2\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_dispatcher.go:597\ngithub.com/tikv/pd/client.(tsoStream).recvLoop\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20241016064947-b70107ec31e6/tso_stream.go:444"] [2024/11/04 10:53:03.033 +08:00] [INFO] [meminfo.go:179] ["use cgroup memory hook because TiDB is in the container"] [2024/11/04 10:53:03.034 +08:00] [INFO] [printer.go:52] ["Welcome to TiDB."] ["Release Version"=v8.5.0-alpha-31-g241c10294b] [Edition=Community] ["Git Commit Hash"=241c10294b9c0305db590576700c00de306be573] ["Git Branch"=HEAD] ["UTC Build Time"="2024-11-01 16:29:12"] [GoVersion=go1.23.2] ["Race Enabled"=false] ["Check Table Before Drop"=false]
What version of PD are you using (
pd-server -V
)?./pd-server -V Release Version: v8.5.0-alpha-3-g60b960a4 Edition: Community Git Commit Hash: 60b960a42341092038147c27dcf1f800b21a9a0d Git Branch: HEAD UTC Build Time: 2024-11-01 10:24:22 2024-11-04T10:31:24.388+0800