matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.77k stars 275 forks source link

[Bug]: lots of longrunning queries and leak cuase mo hung during statbility test on distributed mode #16663

Closed aressu1985 closed 2 months ago

aressu1985 commented 4 months ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

fa460ab

Other Environment Information

- Hardware parameters:
3*CN: 16C 64G
1*DN: 16C 64G
3*LOG: 4C 16G
2*PROXY: 3C 6G
- OS type:
- Others:

Actual Behavior

during statbility test on distributed mode, nearly 4 hourse later, there were longrunning txn comming and made mo hung at last.

image

mo log link: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22Nnv%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-fa460ab-20240604232037%5C%22%7D%20%7C%3D%20%60long%20running%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-12h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

the first long running txn is at 2024/06/05 03:33:22.733993

image

routine_018fe675-5d43-70e4-8fef-95b644f42e40.gz

Expected Behavior

No response

Steps to Reproduce

1. run a mo cluster with config in this issue
2. run tpch 10G loop test processes in one independant tenant
3. run tpcc 10 warehouse and 10 ternimals longrunnig test processes in one independant tenant, prepare mode
4. run sysbench mixed cases(insert/delete/update/select) longrunnig test processes with 75 terminals in one independant tenant,non-prepare mode
5. run another sysbench mixed cases(insert/delete/update/select) longrunnig test processe with  75 terminals in one independant tenant,non-prepare mode

Additional information

No response

zhangxu19830126 commented 4 months ago

leak 导致了 long runing

zhangxu19830126 commented 4 months ago

有很多事务leak了

zhangxu19830126 commented 4 months ago

https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22Nnv%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-fa460ab-20240604232037%5C%22%7D%20%7C%3D%20%60leak%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-12h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

daviszhen commented 4 months ago

两个user leak。

mo_logger.

https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22Nnv%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-fa460ab-20240604232037%5C%22%7D%20%7C%3D%20%60leak%60%20%7C%3D%20%60mo_logger%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221717516800000%22,%22to%22:%221717603199000%22%7D%7D%7D&schemaVersion=1&orgId=1

特殊用户:

https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22Nnv%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-fa460ab-20240604232037%5C%22%7D%20%7C%3D%20%60leak%60%20%7C%3D%20%606bcf64f9-f0fd-4dc9-a24b-ff86e42978d7%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221717516800000%22,%22to%22:%221717603199000%22%7D%7D%7D&schemaVersion=1&orgId=1

daviszhen commented 4 months ago

初步看是mo_logger任务 卡住了。 但是goroutine 没有了。暂时没法看了。

daviszhen commented 3 months ago

txn trace 修复后。没复现。

aressu1985 commented 2 months ago

fixed