matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.76k stars 273 forks source link

[Bug]: pipeline mpool leak, mo-service panic #17521

Open LeftHandCold opened 1 month ago

LeftHandCold commented 1 month ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

6d0d59587a6dbcc162a8530191441ee6ea6e3d7e

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

: 1099513179088\n high water mark : 1099513179128\n"} {"level":"ERROR","time":"2024/07/14 09:36:52.044809 +0800","caller":"mpool/mpool.go:578","msg":"error: error: out of memory"} {"level":"ERROR","time":"2024/07/14 09:36:52.044809 +0800","caller":"mpool/mpool.go:578","msg":"error: error: out of memory"} {"level":"ERROR","time":"2024/07/14 09:36:52.044816 +0800","caller":"mpool/mpool.go:578","msg":"error: error: out of memory"} {"level":"ERROR","time":"2024/07/14 09:36:52.044830 +0800","caller":"mpool/mpool.go:578","msg":"error: error: out of memory"} panic: error: out of memory

goroutine 667 [running]: github.com/matrixorigin/matrixone/pkg/vm/engine/tae/containers.(vectorWrapper).CloneWindow(0xc0787499a0, 0x0, 0xc2456acf00?, {0xc106031af0?, 0xc1b6373600?, 0xc0c0220c30?}) /home/mo/matrixone/pkg/vm/engine/tae/containers/vector.go:398 +0x117 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/tables/updates.(DeleteNode).RangeDeleteLocked(0xc1642c8600, 0x1b2c, 0x0?, {0x56d6c38, 0xc0787499a0}, 0xc0000fec40) /home/mo/matrixone/pkg/vm/engine/tae/tables/updates/delete.go:193 +0x114 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/tables.(baseObject).RangeDelete(0xc0bfe3c730, {0x56f30e0, 0xc2feed2690}, 0x63, 0x1b2c, 0x1b2c, {0x56d6c38, 0xc0787499a0}, 0x0) /home/mo/matrixone/pkg/vm/engine/tae/tables/base.go:786 +0x1b7 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(txnTable).RangeDelete(0xc064c1eab0, 0xc1d4705530, 0x1b2c, 0x1b2c, {0x56d6c38, 0xc0787499a0}, 0x0) /home/mo/matrixone/pkg/vm/engine/tae/txn/txnimpl/table.go:805 +0x56e github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(txnDB).RangeDelete(0xc0a7311970?, 0xc1d4705530, 0x1b2c, 0x1b2c, {0x56d6c38, 0xc0787499a0}, 0x0) /home/mo/matrixone/pkg/vm/engine/tae/txn/txnimpl/txndb.go:153 +0x7d github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(txnStore).RangeDelete(0xc078749980?, 0xc1d4705530, 0x1b2c, 0x1b2c, {0x56d6c38, 0xc0787499a0}, 0x0) /home/mo/matrixone/pkg/vm/engine/tae/txn/txnimpl/store.go:321 +0x68 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(txnRelation).DeleteByPhyAddrKeys.func1({0x1, 0x90, 0xae, 0xc4, 0x36, 0x36, 0x72, 0x26, 0xb6, 0x6c, ...}, ...) /home/mo/matrixone/pkg/vm/engine/tae/txn/txnimpl/relation.go:279 +0x17a github.com/matrixorigin/matrixone/pkg/vm/engine/tae/containers.ForeachWindowFixed[...](0xc10c63fc00, 0x0, 0x1, 0xc13eed9508, 0x0, 0x0) /home/mo/matrixone/pkg/vm/engine/tae/containers/utils.go:703 +0x50d github.com/matrixorigin/matrixone/pkg/vm/engine/tae/containers.ForeachVectorWindow({0x56d6c38, 0xc078749960}, 0x0, 0x1, {0x42dbc80, 0xc2456ad508}, 0x0, 0x0) /home/mo/matrixone/pkg/vm/engine/tae/containers/utils.go:596 +0x9d8 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/txn/txnimpl.(txnRelation).DeleteByPhyAddrKeys(0xc06dad7a10, {0x56d6c38, 0xc078749960}, {0x56d6c38, 0xc078749980}) /home/mo/matrixone/pkg/vm/engine/tae/txn/txnimpl/relation.go:272 +0x153 github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc.(Handle).HandleWrite(0xc00ed7b520, {0x560fcd8, 0xc08d377140}, {0x56f30e0, 0xc2feed2690}, 0xc1fa2b48f0) /home/mo/matrixone/pkg/vm/engine/tae/rpc/handle.go:706 +0xb1f github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc.(Handle).handleRequests(0xc00ed7b520, {0x560fcd8, 0xc08d377140}, {0x56f30e0, 0xc2feed2690}, 0xc1060315b8?) /home/mo/matrixone/pkg/vm/engine/tae/rpc/handle.go:214 +0x2aa github.com/matrixorigin/matrixone/pkg/vm/engine/tae/rpc.(Handle).HandleCommit(, {, _}, {{0xc12fe61ad0, 0x10, 0x10}, 0x0, {0x17e1f08ae87100d3, 0x1, 0x0, ...}, ...}) /home/mo/matrixone/pkg/vm/engine/tae/rpc/handle.go:319 +0x314 github.com/matrixorigin/matrixone/pkg/txn/storage/tae.(taeStorage).Commit(, {, _}, {{0xc12fe61ad0, 0x10, 0x10}, 0x0, {0x17e1f08ae87100d3, 0x1, 0x0, ...}, ...}) /home/mo/matrixone/pkg/txn/storage/tae/storage.go:84 +0x75 github.com/matrixorigin/matrixone/pkg/txn/service.(service).Commit(0xc007439d40, {0x560fcd8, 0xc08d377140}, 0xc0cba0b380, 0xc1be2ed480) /home/mo/matrixone/pkg/txn/service/service_cn_handler.go:263 +0x728 github.com/matrixorigin/matrixone/pkg/tnservice.(store).handleCommit(0xc0058d71e0, {0x560fcd8, 0xc08d377140}, 0xc0cba0b380, 0xc1be2ed480) /home/mo/matrixone/pkg/tnservice/store_rpc_handler.go:123 +0x212 github.com/matrixorigin/matrixone/pkg/txn/rpc.executor.exec({{0xc19ce8cd02996e52, 0x1df1aa55ea1b, 0x8023620}, {0x560fcd8, 0xc08d377140}, 0xc08459fe40, 0xc0cba0b380, {0x565c8c8, 0xc01adc40c0}, 0xc00722dee0, ...}) /home/mo/matrixone/pkg/txn/rpc/server.go:291 +0x102 github.com/matrixorigin/matrixone/pkg/txn/rpc.(server).handleTxnRequest(0xc0048ce420, {0x560fcd8, 0xc004780840}) /home/mo/matrixone/pkg/txn/rpc/server.go:266 +0x125 github.com/matrixorigin/matrixone/pkg/common/stopper.(Stopper).doRunCancelableTask.func1() /home/mo/matrixone/pkg/common/stopper/stopper.go:277 +0x6e created by github.com/matrixorigin/matrixone/pkg/common/stopper.(*Stopper).doRunCancelableTask in goroutine 507 /home/mo/matrixone/pkg/common/stopper/stopper.go:272 +0xb0

9883430:{"level":"INFO","time":"2024/07/14 13:19:40.534192 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-50bc-70bf-8239-914ef7c3d51a new high watermark\n allocations : 23616535\n frees : 22565178\n alloc bytes : 6436844360\n free bytes : 5363040992\n current bytes : 1073803368\n high water mark : 1073803368\n"} 9883464:{"level":"INFO","time":"2024/07/14 13:19:41.162951 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-5079-7933-80e1-3dc4afd4fbfd new high watermark\n allocations : 23740405\n frees : 22680374\n alloc bytes : 6429148692\n free bytes : 5355375780\n current bytes : 1073772912\n high water mark : 1073772912\n"} 9883568:{"level":"INFO","time":"2024/07/14 13:19:43.498061 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-50fb-735a-b776-4b9b53838d71 new high watermark\n allocations : 23644126\n frees : 22586605\n alloc bytes : 6576441500\n free bytes : 5502572700\n current bytes : 1073868800\n high water mark : 1073868912\n"} 9883629:{"level":"INFO","time":"2024/07/14 13:19:44.538271 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-50cb-7d8a-a7f4-97694932dd79 new high watermark\n allocations : 23758867\n frees : 22704248\n alloc bytes : 6516507804\n free bytes : 5442724740\n current bytes : 1073783064\n high water mark : 1073783064\n"} 9883850:{"level":"INFO","time":"2024/07/14 13:19:49.948406 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-510a-7002-a6c5-6dd80755a3fc new high watermark\n allocations : 24464974\n frees : 23382056\n alloc bytes : 6446083512\n free bytes : 5372272936\n current bytes : 1073810576\n high water mark : 1073810576\n"} 9883859:{"level":"INFO","time":"2024/07/14 13:19:50.137734 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-50db-74b3-8d94-094f560178be new high watermark\n allocations : 23816939\n frees : 22765332\n alloc bytes : 6595429476\n free bytes : 5521658916\n current bytes : 1073770560\n high water mark : 1073770560\n"} 9888060:{"level":"INFO","time":"2024/07/14 13:19:57.861677 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-5127-77cf-bdb8-1792f366f11c new high watermark\n allocations : 24394117\n frees : 23302390\n alloc bytes : 6543421756\n free bytes : 5469671892\n current bytes : 1073749864\n high water mark : 1073749864\n"} 9888308:{"level":"INFO","time":"2024/07/14 13:20:03.551476 +0800","caller":"mpool/mpool.go:84","msg":"MPool global new high watermark\n allocations : 287542724\n frees : 262464527\n alloc bytes : 85124931479\n free bytes : 73313759839\n current bytes : 11811171616\n high water mark : 11811171640\n"} 9890585:{"level":"INFO","time":"2024/07/14 13:20:17.285741 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-5119-73cc-8652-95091e7ee9ff new high watermark\n allocations : 24436252\n frees : 23343658\n alloc bytes : 6703329664\n free bytes : 5629489376\n current bytes : 1073840288\n high water mark : 1073840288\n"} 9890826:{"level":"INFO","time":"2024/07/14 13:20:21.008283 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-50eb-737b-bba6-498f97aac6bd new high watermark\n allocations : 25317753\n frees : 24176949\n alloc bytes : 6651110356\n free bytes : 5577362916\n current bytes : 1073747440\n high water mark : 1073747440\n"} 9890930:{"level":"INFO","time":"2024/07/14 13:20:22.618798 +0800","caller":"mpool/mpool.go:84","msg":"MPool pipeline-0190afa9-50aa-7dd9-b060-1484f5bd535e new high watermark\n allocations : 25213945\n frees : 24078291\n alloc bytes : 6672479800\n free bytes : 5598593440\n current bytes : 1073886360\n high water mark : 1073886360\n"}

Expected Behavior

No response

Steps to Reproduce

run tpcc 10-10. You can definitely reproduce it after 15 hours.

Additional information

No response

m-schen commented 1 month ago

...

m-schen commented 1 month ago

先往后排,这个问题比较长期。

下周会开始处理pipeline batch生命周期不好管理的问题。(不过我觉得和这玩意儿铁定没啥关系。。)

m-schen commented 2 weeks ago

这几天无法投入该issue,在做pipeline

m-schen commented 1 week ago

请假了

m-schen commented 5 hours ago

在收尾pipeline spool的工作