matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.77k stars 275 forks source link

[Bug]: [date 9.29]tke regression: mo reported lots of cannot commit a orphan transaction and sysbench test cannot stopped #19130

Open heni02 opened 6 days ago

heni02 commented 6 days ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

206f549087532728cba62753c430a0df5f32bdbb

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/11093493322

企业微信截图_255026af-56aa-4ab5-9e6a-9acdaa0034a0

报错时的日志: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22FCf%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-206f54908-20240929%5C%22%7D%20%7C%3D%20%60cannot%20commit%20a%20orphan%20transaction%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221727667623857%22,%22to%22:%221727667815857%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

tke regression sysbench test

Additional information

No response

heni02 commented 6 days ago

定位到的信息:

企业微信截图_c4c7e2b9-339c-483d-be5e-3cde696c72f4 企业微信截图_1338bf5c-91b1-4813-8fcc-c1c14bdfcad3
heni02 commented 6 days ago

@volgariver6 定位到的信息: {"level":"ERROR","time":"2024/09/30 03:33:11.891247 +0000","name":"cn-service","caller":"cnservice/server_heartbeat.go:102","msg":"failed to send cn heartbeat","uuid":"30336334-6135-3639-3936-633863323334","error":"rpc timeout"} 这个cn发送心跳超时,最后被hakeeper删掉了

heni02 commented 6 days ago

date9.30 regression也出现该问题,还有为什么sysbench无法断开链接也需要定位原因(环境可以登陆不是hung) job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/11108340207/job/30890151720

企业微信截图_2c70e083-756a-4163-8622-4e1c8bf01c84

mo最新时的日志都是报大量孤儿事务错误:

企业微信截图_16a7196e-9065-45a3-95ac-efe00c0a0c0c

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22xPJ%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-75580f71f-20240930%5C%22%7D%20%7C%3D%20%60cannot%20commit%20a%20orphan%20transaction%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221727699690647%22,%22to%22:%221727786090647%22%7D%7D%7D&schemaVersion=1&orgId=1 最早开始报错时间:11:31:30

heni02 commented 5 days ago

date 10.1 job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/11123634459/job/30943462783 1.tpch 1T load时间报错

企业微信截图_d5e2cfda-41d3-4a44-81b1-c7f0731337c0

查看内存没有占满

image

2.sysbench也和昨天一样报错

企业微信截图_7b366c6f-417a-491d-9081-5b243126a437 image

mo log: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22Wjp%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-601796480-20241001%5C%22%7D%20%7C%3D%20%60cannot%20commit%20a%20orphan%20transaction%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221727776615562%22,%22to%22:%221727863015562%22%7D%7D%7D&schemaVersion=1&orgId=1