matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 277 forks source link

[Bug]: recreate TN Pod cause all CN panic #20289

Closed aylei closed 7 hours ago

aylei commented 7 hours ago

Is there an existing issue for the same bug?

Branch Name

v2.0.1

Commit ID

6676ec463

Other Environment Information

MOC dev

Actual Behavior

Restart TN cause all CN panic, CN log:

{"level":"INFO","time":"2024/11/22 05:51:11.565485 +0000","name":"cn-service","caller":"cnservice/server_metadata.go:52","msg":"local CNStore loaded","uuid":"63346662-3865-6133-3861-386437396662","metadata":"63346662-3865-6133-3861-386437396662/TP"}
{"level":"INFO","time":"2024/11/22 05:51:11.565511 +0000","caller":"cnservice/server.go:460","msg":"Shutdown The Server With Ctrl+C | Ctrl+\\."}
{"level":"INFO","time":"2024/11/22 05:51:11.565519 +0000","caller":"cnservice/server.go:467","msg":"Initialize the engine ..."}
{"level":"INFO","time":"2024/11/22 05:51:11.598215 +0000","caller":"disttae/engine.go:152","msg":"INIT-ENGINE-CONFIG","InsertEntryMaxCount":5000,"WorkspaceThreshold":1048576,"CNTransferTxnLifespanThreshold":"5s"}
{"level":"ERROR","time":"2024/11/22 05:51:11.598682 +0000","caller":"disttae/logtail_consumer.go:1538","msg":"error: internal error: no TN store found","span":{"trace_id":"a1cc5322-f333-6e69-f8f4-78638b4b0911","span_id":"885cd780784417b3"}}
panic: internal error: no TN store found

goroutine 449 [running]:
main.startCNService.func1({0x57e9718, 0xc001312540})
    /go/src/github.com/matrixorigin/matrixone/cmd/mo-service/main.go:272 +0x89a
github.com/matrixorigin/matrixone/pkg/common/stopper.(*Stopper).doRunCancelableTask.func1()
    /go/src/github.com/matrixorigin/matrixone/pkg/common/stopper/stopper.go:277 +0x5b
created by github.com/matrixorigin/matrixone/pkg/common/stopper.(*Stopper).doRunCancelableTask in goroutine 1
    /go/src/github.com/matrixorigin/matrixone/pkg/common/stopper/stopper.go:272 +0xb0

Expected Behavior

Restart TN should not cause CN panic

Steps to Reproduce

1. launch a distributed MO cluster
2. recreate TN Pod

Additional information

No response

aylei commented 7 hours ago

dup https://github.com/matrixorigin/matrixone/issues/20200