matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 276 forks source link

[Bug]: [1104 main tke regression] tpcc 500-1000 report lots of 'Communications link failure'. #19762

Closed Ariznawlll closed 2 weeks ago

Ariznawlll commented 3 weeks ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

7d5f3b3c7f06280e1e69267f449e4e62b9283aa7

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/11652323735/job/32447954186

在报错Communication link failure之前,有cannot commit a orphan transaction报错,不确定两者是否存在影响,需定位

image

在上面报错之后,执行load data测试以及sysbench测试,均卡住

image

TPCC 500-1000测试期间日志(UTC时间): https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22o3e%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-7d5f3b3c7-20241103%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221730662892000%22,%22to%22:%221730665151000%22%7D%7D%7D&schemaVersion=1&orgId=1

goroutine: CN_61623035-3433-3539-3066-303062396161_leakcheck_routine_0192f3b8-dccb-7ef2-b7e0-1357056eb661.gz

WeChatWorkScreenshot_7a29bd71-5102-4be6-9c79-47e3bb44719b

Expected Behavior

No response

Steps to Reproduce

trigger tke daily regression test

Additional information

No response

sukki37 commented 3 weeks ago
2024/11/03 20:30:31.628259 +0000 ERROR cn-service found long running txn {"uuid": "", "txn-id": "49232f5b9d2bfca118047df12d526633", "create-at": "2024/11/03 20:07:06.045294 +0000", "options": "Features:1 CN:\"61623035-3433-3539-3066-303062396161\" SessionID:\"0192f38b-eff0-713e-9fca-41e598388867\" ConnectionID:16503 UserName:\"dump\" counter:\"commit: enter:0, exit:0 rollback: enter:0, exit:0 runSql: enter:3, exit:2 incrStmt: enter:4, exit:4 rollbackStmt: enter:1, exit:1 footPrints: [0: enter:2, exit:2] [1: enter:2, exit:2] [2: enter:2, exit:2] [4: enter:2, exit:2] [6: enter:3, exit:2] [7: enter:3, exit:2] [8: enter:3, exit:2] [11: enter:2, exit:2] [12: enter:2, exit:2] [13: enter:3, exit:2] [14: enter:2, exit:2] [15: enter:3, exit:2] [16: enter:3, exit:2] [85: enter:2, exit:2] [86: enter:2, exit:2] [87: enter:2, exit:2] [99: enter:1, exit:1] [101: enter:1, exit:1] [102: enter:1, exit:1] [107: enter:1, exit:1] [110: enter:1, exit:1] \" sessionInfo:\"connectionId 16503|10.143.19.42:58718|account sys:dump|goRoutineId 2788509|migrate-goRoutineId 0|0192f38b-eff0-713e-9fca-41e598388867\" inRunSql:true ", "profile": "ETL:/profile/CN_61623035-3433-3539-3066-303062396161_leakcheck_routine_0192f3b8-dccb-7ef2-b7e0-1357056eb661.gz"}

The lock held by 807adddf8845b4b418047df157b3d3b is not being released, continuous error: “txn failed to unlock table on remote”

https://grafana.ci.matrixorigin.cn/goto/MHij-CZNR?orgId=1

zhangxu19830126 commented 3 weeks ago

fixed by #19772

Ariznawlll commented 3 weeks ago

Last night's test reported a panic error, after the repair, I will observe again

Ariznawlll commented 3 weeks ago

Last night's test reported a panic error, after the repair, I will observe again

Ariznawlll commented 2 weeks ago

fixed