matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 277 forks source link

[Bug]: UT TestLockNeedUpgrade fail #19920

Open YANGGMM opened 3 weeks ago

YANGGMM commented 3 weeks ago

Is there an existing issue for the same bug?

Branch Name

2.0-dev

Commit ID

newest

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

image

Expected Behavior

No response

Steps to Reproduce

https://github.com/matrixorigin/matrixone/actions/runs/11770570627/job/32783195098?pr=19919

Additional information

No response

jensenojs commented 3 weeks ago

正常的执行流程是 :

  1. remoteLockTable 发起请求
  2. localLockTable 上锁
  3. 发现行锁数量过大
  4. 尝试行锁升级为表锁
  5. 升级成功, 测试结束

目前看到日志中, 看到成功地走入了第四步, 即Trying to upgrade lock level due, 但是在尝试升级为表锁之后的上锁失败了.

...
a164e0a,3a164e0b,3a164e0c,3a164e0d,3a164e0e,3a164e0f,3a164e10,3a164e11,3a164e12,3a164e13,3a164e14,3a164e15,3a164e16,3a164e17,3a164e18,3a164e19,3a164e1a,3a164e1b,3a164e1c,3a164e1d,3a164e1e,3a164e1f,3a164e20]", "opts": "Exclusive-Row-Wait", "remote": "0-272596(272596)-17312904024755964890-cn-11001-1731290398211523887", "error": "row level lock is too large that need upgrade to table level lock"}
2024-11-11T02:07:46.8341109Z 2024/11/11 02:04:08.623269 +0000 INFO lockop/lock_op.go:645 Trying to upgrade lock level due to too many row level locks for txn �OZeq�T�r��<�
2024-11-11T02:07:46.8342625Z 2024/11/11 02:04:09.005022 +0000 INFO cn-service.MetricStorageUsage mometric/cron_task.go:221 start next round {"service": "2-cn-11001", "uuid": "2-cn-11001", "span": {}}
2024-11-11T02:07:46.8343846Z     issue_test.go:595: 
2024-11-11T02:07:46.8344612Z            Error Trace:    /home/runner/work/matrixone/matrixone/pkg/tests/issues/issue_test.go:595
2024-11-11T02:07:46.8345851Z                                        /home/runner/work/matrixone/matrixone/pkg/sql/compile/sql_executor.go:138
2024-11-11T02:07:46.8347036Z                                        /home/runner/work/matrixone/matrixone/pkg/tests/issues/issue_test.go:588
2024-11-11T02:07:46.8348101Z                                        /home/runner/work/matrixone/matrixone/pkg/embed/testing.go:76
2024-11-11T02:07:46.8349221Z                                        /home/runner/work/matrixone/matrixone/pkg/tests/issues/issue_test.go:484
2024-11-11T02:07:46.8349868Z            Error:          Received unexpected error:
2024-11-11T02:07:46.8350701Z                            row level lock is too large that need upgrade to table level lock
2024-11-11T02:07:46.8351277Z            Test:           TestLockNeedUpgrade
2024-11-11T02:07:46.8351811Z --- FAIL: TestLockNeedUpgrade (12.80s)
2024-11-11T02:07:46.8379171Z ##[group]Run find ./ut-report -name top.txt -exec cat {} \;
2024-11-11T02:07:46.8379728Z find ./ut-report -name top.txt -exec cat {} \;
2024-11-11T02:07:46.8436207Z shell: /usr/bin/bash -e {0}
...
jensenojs commented 2 weeks ago

已经加了相关日志, 等后续观察

sukki37 commented 1 week ago

repro: https://github.com/matrixorigin/matrixone/actions/runs/11980044139/job/33415502583?pr=20286

jensenojs commented 1 week ago

repro: https://github.com/matrixorigin/matrixone/actions/runs/11980044139/job/33415502583?pr=20286

复现是基于2.0-dev的分支, 这个分支上没有加相关的日志. 还需要继续观察

jensenojs commented 1 week ago

正常的执行流程是 :

  1. remoteLockTable 发起请求
  2. localLockTable 上锁
  3. 发现行锁数量过大
  4. 尝试行锁升级为表锁
  5. 升级成功, 测试结束

目前看到日志中, 看到成功地走入了第四步, 即Trying to upgrade lock level due, 但是在尝试升级为表锁之后的上锁失败了.

...
a164e0a,3a164e0b,3a164e0c,3a164e0d,3a164e0e,3a164e0f,3a164e10,3a164e11,3a164e12,3a164e13,3a164e14,3a164e15,3a164e16,3a164e17,3a164e18,3a164e19,3a164e1a,3a164e1b,3a164e1c,3a164e1d,3a164e1e,3a164e1f,3a164e20]", "opts": "Exclusive-Row-Wait", "remote": "0-272596(272596)-17312904024755964890-cn-11001-1731290398211523887", "error": "row level lock is too large that need upgrade to table level lock"}
2024-11-11T02:07:46.8341109Z 2024/11/11 02:04:08.623269 +0000 INFO lockop/lock_op.go:645 Trying to upgrade lock level due to too many row level locks for txn �OZeq��T���r��<�
2024-11-11T02:07:46.8342625Z 2024/11/11 02:04:09.005022 +0000 INFO cn-service.MetricStorageUsage mometric/cron_task.go:221 start next round {"service": "2-cn-11001", "uuid": "2-cn-11001", "span": {}}
2024-11-11T02:07:46.8343846Z     issue_test.go:595: 
2024-11-11T02:07:46.8344612Z          Error Trace:    /home/runner/work/matrixone/matrixone/pkg/tests/issues/issue_test.go:595
2024-11-11T02:07:46.8345851Z                                      /home/runner/work/matrixone/matrixone/pkg/sql/compile/sql_executor.go:138
2024-11-11T02:07:46.8347036Z                                      /home/runner/work/matrixone/matrixone/pkg/tests/issues/issue_test.go:588
2024-11-11T02:07:46.8348101Z                                      /home/runner/work/matrixone/matrixone/pkg/embed/testing.go:76
2024-11-11T02:07:46.8349221Z                                      /home/runner/work/matrixone/matrixone/pkg/tests/issues/issue_test.go:484
2024-11-11T02:07:46.8349868Z          Error:          Received unexpected error:
2024-11-11T02:07:46.8350701Z                          row level lock is too large that need upgrade to table level lock
2024-11-11T02:07:46.8351277Z          Test:           TestLockNeedUpgrade
2024-11-11T02:07:46.8351811Z --- FAIL: TestLockNeedUpgrade (12.80s)
2024-11-11T02:07:46.8379171Z ##[group]Run find ./ut-report -name top.txt -exec cat {} \;
2024-11-11T02:07:46.8379728Z �[36;1mfind ./ut-report -name top.txt -exec cat {} \;�[0m
2024-11-11T02:07:46.8436207Z shell: /usr/bin/bash -e {0}
...

另外, 两次复现都是failed从远端发起的锁升级, 不知道跟这件事情有没有关系

ouyuanning commented 1 week ago

俊洪帮忙看一下吧

iamlinjunhong commented 2 days ago

正在处理