matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 276 forks source link

[Bug]: dup and w-w during sysbench write_only case and tpcc test #14880

Closed aressu1985 closed 4 weeks ago

aressu1985 commented 8 months ago

Is there an existing issue for the same bug?

Branch Name

1.1-dev

Commit ID

01fa016

Other Environment Information

- Hardware parameters:
  3*CN: 16C 64G
  1*DN: 16C 64G
  3*LOG: 4C 16G
- OS type:
- Others:

Actual Behavior

job link: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/8210572569/job/22458565541

image image

mo-log: http://175.178.192.213:30088/explore?panes=%7B%22DWZ%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22branch-reg-01fa016%5C%22%7D%20%7C%3D%20%60Duplicate%20entry%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221709964129691%22,%22to%22:%221710005409691%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

1. run sysbench write only case on 1.1-dev

Additional information

No response

triump2020 commented 8 months ago

Dup produced by CN had been fixed!

triump2020 commented 7 months ago

Latest main branch produced again, being fix.

triump2020 commented 7 months ago

目前只在长期分布式稳定性中出现

triump2020 commented 7 months ago

https://github.com/matrixorigin/matrixone/pull/15235

triump2020 commented 7 months ago

Not working on this

triump2020 commented 7 months ago

https://github.com/matrixorigin/matrixone/pull/15545

triump2020 commented 6 months ago

Not working on this

triump2020 commented 6 months ago

原因如下: 1. txn1 在CN1 上insert 了 一条 PK, 并committed. 2. CN2 上的txn2 还未等到 这个pk 同步到partition state 中,就开始 运行 delete pk(delete statment 的snapshot ts 应该是小于txn1 的commit ts 的, 否则CN2 会等water mark 超过txn1 的commit ts ) , 这时pk 的rowid 查不到,delete 运行之后,affected rows =0, 相当于delete 没起效果; 然后运行 insert pk , 去重时,之前被txn1 提交的pk 同步过来了,然后在partiton state 中发现了相同的pk , 导致dup.

triump2020 commented 6 months ago

https://github.com/matrixorigin/matrixone/pull/15948

triump2020 commented 6 months ago

https://github.com/matrixorigin/matrixone/pull/15992 fixed dup/ww bug.

triump2020 commented 6 months ago

wait for @ouyuanning 's pr

triump2020 commented 5 months ago

wait for @ouyuanning 's pr

triump2020 commented 5 months ago

Wait for @ouyuanning ’ PR

triump2020 commented 5 months ago

In testing

triump2020 commented 5 months ago

@ouyuanning 's PR 还是有dup ,需要继续查

triump2020 commented 5 months ago

blocked

triump2020 commented 5 months ago

正在加日志,查 tke-dup 问题

triump2020 commented 4 months ago

单机的ww/dup 已定位到大致原因.

triump2020 commented 4 months ago

Wait for PR merge

triump2020 commented 3 months ago

@aressu1985 已知 的 dup/ww 都已经fix 了, 请测试观察. 远宁那个 delete 加锁问题导致 的dup/ww 有对应的issue.

aressu1985 commented 3 months ago

testing

ouyuanning commented 3 months ago

分支回归测试会有问题,还没时间处理

ouyuanning commented 3 months ago

未投入

ouyuanning commented 2 months ago

未投入

ouyuanning commented 2 months ago

未投入

ouyuanning commented 2 months ago

未投入

ouyuanning commented 2 months ago

未投入

ouyuanning commented 2 months ago

未投入

ouyuanning commented 1 month ago

未投入

ouyuanning commented 1 month ago

未投入

ouyuanning commented 1 month ago

未投入

ouyuanning commented 1 month ago

先处理DML重构

ouyuanning commented 1 month ago

先处理DML重构