matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 277 forks source link

[Bug]: [1121 big data regression]cn oom in create index. #20236

Open Ariznawlll opened 16 hours ago

Ariznawlll commented 16 hours ago

Is there an existing issue for the same bug?

Branch Name

2.0-dev

Commit ID

a6b92d6e10e6d8016389e1ffa48c4a5ceadfd04e

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/11932164320/job/33281117600

image

log:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22y5g%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-nightly-a6b92d6-20241120%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221732137765000%22,%22to%22:%221732141496000%22%7D%7D%7D&schemaVersion=1&orgId=1

heap profile: heap_profile1121.zip

malloc_profile: malloc_profile1121.zip

Expected Behavior

No response

Steps to Reproduce

big data regression

数据量:300亿

Additional information

No response

ouyuanning commented 15 hours ago

panic时候的堆内内存占用较小(不到4G) image

panic的直接原因是hashtable申请的内存超过限制了 https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22y5g%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-nightly-a6b92d6-20241120%5C%22%7D%20%7C%3D%20%60panic%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221732137765000%22,%22to%22:%221732138436000%22%7D%7D%7D&schemaVersion=1&orgId=1

image

ouyuanning commented 15 hours ago

@badboynt1 帮看看,这里shufflebuild的内存占用看是不是最近有些什么改动吧

badboynt1 commented 15 hours ago

https://github.com/matrixorigin/matrixone/pull/20167 需要把这个pr cp到2.0 @aunjgr

ouyuanning commented 9 hours ago

可能跟 https://github.com/matrixorigin/matrixone/issues/20213 是一个问题