matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.78k stars 276 forks source link

[Bug]: there are still lots of w-w during tpcc longrunning test on distributed test #15843

Open aressu1985 opened 6 months ago

aressu1985 commented 6 months ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

6e94513897f3a739602429e5a61d7c9127fd9a2a

Other Environment Information

- Hardware parameters:
3*CN: 16C 64G
1*DN: 16C 64G
3*LOG: 4C 16G
2*PROXY: 3C 6G
- OS type:
- Others:

Actual Behavior

during statbility test on distributed, there are lots of w-w and dup during tpcc benchmark test:

image image

tpcc tool logs:

tpcc-test.tar.gz

mo-log: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%223sB%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-6e94513-20240503%5C%22,%20pod%3D%5C%22stability-regression-dis-dn-0%5C%22%7D%20%7C%3D%20%60w-w%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-2d%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

1. run a mo cluster with config in this issue
2. run tpch 10G loop test processes in one independant tenant
3. run tpcc 10 warehouse and 10 ternimals longrunnig test processes in one independant tenant, prepare mode
4. run sysbench mixed cases(insert/delete/update/select) longrunnig test processes with 75 terminals in one independant tenant,non-prepare mode
5. run another sysbench mixed cases(insert/delete/update/select) longrunnig test processe with  75 terminals in one independant tenant,non-prepare mode

Additional information

No response

triump2020 commented 6 months ago

https://github.com/matrixorigin/matrixone/pull/15948

triump2020 commented 5 months ago

https://github.com/matrixorigin/matrixone/pull/15992 fixed dup/ww bug.

triump2020 commented 5 months ago

Wait for @ouyuanning ’ PR

triump2020 commented 5 months ago
In testing validation.
triump2020 commented 5 months ago

In testing

ouyuanning commented 5 months ago

今天罗飞帮忙加了日志。还没跑验证

ouyuanning commented 5 months ago

昨天跑还有发现,待继续加日志分析

triump2020 commented 4 months ago

企业微信截图_17180727009693

从日志看: 运行 : DELETE FROM sbtest3 WHERE id=50005 , 没有去加锁,导致找不到pk =50005 的记录,从而没有产生deletes, 之后 事务紧跟着运行了 Insert pk=50005 导致了dup。

ouyuanning commented 4 months ago

待继续分析

ouyuanning commented 4 months ago

待继续分析

ouyuanning commented 4 months ago

在处理其他s-1

ouyuanning commented 4 months ago

在处理其他s-1

ouyuanning commented 4 months ago

在处理其他s-1

ouyuanning commented 3 months ago

待再跑一下sysbench和regresion

triump2020 commented 3 months ago

@aressu1985 用这个issue 来跟踪 delete 加锁导致的 dup/ww 问题.

ouyuanning commented 3 months ago

regression还是会有问题。还在排查

ouyuanning commented 5 days ago

等DML重构版本时候处理

ouyuanning commented 16 hours ago

等DML重构版本合并后处理