matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.71k stars 265 forks source link

[Bug]: [date 7.3]standalone regression: sysbench1000w delete test reported fatal error: concurrent map read and map write #17314

Open heni02 opened 2 days ago

heni02 commented 2 days ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

ae8f052fe90c85c90463a766db2fabe3c3bbd8ba

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9779716772/job/26999862135 image

fatal error: concurrent map read and map write

goroutine 2489 [running]: github.com/matrixorigin/matrixone/pkg/vm/engine/tae/tables/updates.(ObjectMVCCHandle).TryGetDeleteChain(...) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/tables/updates/mvcc.go:384 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/tables/updates.(ObjectMVCCHandle).GetLatestDeltaloc(0xc19d84a460?, 0x7af?) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/tables/updates/mvcc.go:548 +0x2d github.com/matrixorigin/matrixone/pkg/vm/engine/tae/db.(MergeTaskBuilder).onTable(0xc0090ac3f0, 0xd5e9180180) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/db/scannerop.go:168 +0x557 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/catalog.(LoopProcessor).OnTable(...) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/catalog/processor.go:59 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/db.(dbScanner).onTable(0xc00541e550, 0xd5e9180180) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/db/scanner.go:172 +0x202 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/catalog.(LoopProcessor).OnTable(...) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/catalog/processor.go:59 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/catalog.(DBEntry).RecurLoop(0xc868918b40?, {0x548cb60, 0xc00541e550}) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/catalog/database.go:547 +0x63 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/catalog.(Catalog).RecurLoop(0xe7521f1000000000?, {0x548cb60, 0xc00541e550}) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/catalog/catalog.go:374 +0xc5 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/db.(dbScanner).OnExec(0xc00541e550) /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/db/scanner.go:72 +0x225 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/tasks/worker.(heartbeater).Start.func1() /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/tasks/worker/heartbeater.go:76 +0x82 created by github.com/matrixorigin/matrixone/pkg/vm/engine/tae/tasks/worker.(*heartbeater).Start in goroutine 1507 /data1/runners/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/tae/tasks/worker/heartbeater.go:67 +0x8f

mo log: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bhost%3D%5C%2210-222-1-128%5C%22,%20filename%3D%5C%22%2Fdata1%2Frunners%2Faction-runner%2F_work%2Fmo-nightly-regression%2Fmo-nightly-regression%2Fhead%2Fmo-service-ae8f052-20240703-222812.log%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221720031201692%22,%22to%22:%221720031364711%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

sysbench  --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111   oltp_delete.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=100 --time=300 --report-interval=10     --range_selects=off --point_selects=1 prepare

 sysbench  --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111   oltp_delete.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=100 --time=300 --report-interval=10     --range_selects=off --point_selects=1 run

Additional information

No response

m-schen commented 2 days ago

TryGetDeleteChain,GetOrCreateDeleteChainLocked等方法都没有给map上锁,直接对map进行了操作。 并发调用会引起panic.

我看这个结构体的有些方法是会做上锁操作的,如 GetDeltaLocAndCommitTS。

麻烦检查一下哪些方法会被并发调用导致写map和其他操作同时发生,需要上锁的。 @jiangxinmeng1

heni02 commented 2 days ago

tpcc测试也报了fatal error:

企业微信截图_72e0dc97-eccf-45c2-acfa-ade97aaebb3a 企业微信截图_21f332fe-9f91-4bea-9764-cf0621b0e88a
w-zr commented 1 day ago

PR merged.