matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.77k stars 274 forks source link

[Bug]: [0605 big data regression] insert into select oom. #16650

Closed Ariznawlll closed 2 months ago

Ariznawlll commented 3 months ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

43ebd753a8342ea4ff2d64362e1420ae96e725e9

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job url:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9366247347/job/25799726334 (load and insert test-> insert into select)

image image image

log:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22vSV%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240604%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-12h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

profile: 2024-06-04_16_56_47.zip 2024-06-04_16_55_08.zip

Expected Behavior

No response

Steps to Reproduce

trigger big data test on tke.
If you need some test details, please contact me.

Additional information

No response

YANGGMM commented 3 months ago

@ouyuanning 麻烦看一下

ouyuanning commented 3 months ago

辛苦锦赛统一看看吧

jensenojs commented 3 months ago

无进展

jensenojs commented 3 months ago

上面issue message中的profile文件夹中没有oom节点的pprof信息, 魏璐帮忙找了一下grafana抓的几天前跑的结果, 根据 https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9520234425/job/26245691714 的时间戳 Fri, 14 Jun 2024 22:19:17 GMT, 用grafana看这个时间段前10s的inuse space和alloc space

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22cg0%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240614%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22memory:inuse_space:bytes:space:bytes%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221718403547000%22,%22to%22:%221718403557000%22%7D%7D%7D&schemaVersion=1&orgId=1


jensenojs commented 3 months ago

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22Bjf%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240614%5C%22,%20pod%3D%5C%22nightly-regression-dis-tp-cn-vhhfr%5C%22%7D%20%7C~%20%60gc%20.%2Aglobal%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221718403317000%22,%22to%22:%221718403557000%22%7D%7D%7D&schemaVersion=1&orgId=1

gc .*global的正则匹配中看到是nightly-regression-dis-tp-cn-vhhfr的oom, 这个阶段的inuse的内存占用是比较多的,

image

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22cg0%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240614%5C%22,pod%3D%5C%22nightly-regression-dis-tp-cn-vhhfr%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22memory:inuse_space:bytes:space:bytes%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221718403500000%22,%22to%22:%221718403520000%22%7D%7D%7D&schemaVersion=1&orgId=1

下面的优化做了之后可能能解决这个issue的问题

jensenojs commented 3 months ago

还需要继续优化, 现在有最新的问题 : https://github.com/matrixorigin/matrixone/issues/17143#issuecomment-2188493489

jensenojs commented 2 months ago

1.2-dev上的fix, 等1.2.2打了tag之后合并

jensenojs commented 2 months ago

等待合并

Ariznawlll commented 2 months ago

最近没有再出现这个问题,先关闭