matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.76k stars 273 forks source link

[Bug]: [0814] big-data-regression: delete from table report 'context deadline exceeded'. #18124

Open Ariznawlll opened 1 month ago

Ariznawlll commented 1 month ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

4d2a745cd39ed0f8a680acded5904536acee65a5

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/10368751013/job/28728120148

image

pod状态:

image

出错时间dn重启过:

image

context deadline exceeded报错相关log: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22GB3%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240813%5C%22%7D%20%7C%3D%20%60context%20deadline%20exceeded%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221723625980000%22,%22to%22:%221723625988000%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

big data regression

测试配置:https://github.com/matrixorigin/mo-nightly-regression/blob/big_data/big-data-tke.yaml

数据量:10亿

出错的sql:delete from big_data_test.table_with_pk_index_for_write_1B where id <= 10000000

Additional information

No response

sukki37 commented 1 month ago

https://grafana.ci.matrixorigin.cn/goto/fweVP5CSg?orgId=1 @volgariver6

volgariver6 commented 1 month ago

https://grafana.ci.matrixorigin.cn/goto/cAK5e5jSg?orgId=1

根本原因是rpc的包超过100M,由于dn的拆包实现是有一定的误差,所以建议把rpc max message size 改成200M再进行测试。

Ariznawlll commented 1 month ago

[0817] job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/10419909260

昨天改了配置之后,搜日志看到还是有这个报错,但是job里面执行没报错

企业微信截图_03b2d945-2199-477f-9acf-9053ef1689d8 企业微信截图_fcf0e04f-0ef6-43ad-b3df-c7d7a26ab8e3

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22Jg2%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240816%5C%22%7D%20%7C%3D%20%60delete%20from%20big_data_test.table_with_pk_index_for_write_1B%20where%20id%20%3C%3D%2010000000%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221723874520000%22,%22to%22:%221723874940000%22%7D%7D%7D&schemaVersion=1&orgId=1

并且这个时间段内有panic https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22Jg2%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240816%5C%22%7D%20%7C%3D%20%60panic%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221723874520000%22,%22to%22:%221723874940000%22%7D%7D%7D&schemaVersion=1&orgId=1

博哥说需要在logservice这里加上配置 [logservice.rpc] max-message-size "200M"

配置已加,等测试结果

volgariver6 commented 3 weeks ago

修改配置后没问题了

Ariznawlll commented 3 weeks ago

最近几次没再出现过这个问题, loki里main也没有报错'context deadline execeed'

最新一次测试结果:

https://github.com/matrixorigin/mo-nightly-regression/actions/runs/10509468758/job/29135554152

image
Ariznawlll commented 1 week ago

【0904】 job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/10685465952/job/29635505459

commit:c03cbb087cafdf71c7ea076d082630cfb9cf830c

image

日志:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%226Yu%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240903%5C%22%7D%20%7C%3D%20%60context%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221725428951000%22,%22to%22:%221725429801000%22%7D%7D%7D&schemaVersion=1&orgId=1

Ariznawlll commented 1 week ago

[0905] big-data-regression: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/10722568409/job/29736831271

image

对应时间dn重启过:

image

对应时间的日志: https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22pQO%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240905%5C%22%7D%20%7C%3D%20%60context%20deadline%20exceeded%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221725553644000%22,%22to%22:%221725553824000%22%7D%7D%7D&schemaVersion=1&orgId=1

panic日志 https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22AdE%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20240905%5C%22%7D%20%7C%3D%20%60panic%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221725553200000%22,%22to%22:%221725553800000%22%7D%7D%7D&schemaVersion=1&orgId=1

volgariver6 commented 5 days ago

还没处理