matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 276 forks source link

[Bug]: insert into values duplicate key update 100threads mo crashed #14038

Open heni02 opened 10 months ago

heni02 commented 10 months ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

4531180d4b20c179baa426cca179aaebc70cec84

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

原表数据量为1050万有主键,并发100,刚开始执行mo 挂掉了,但数据量为1050万无主键并发100测试mo正常 sql:insert into table_1000w(clo4) values({tbx}) on duplicate key update clo2=clo2/10;

企业微信截图_1167f74c-a532-441c-9ae2-9e015b05c4ef 企业微信截图_9eda5e40-ee3d-4bd6-aa68-827da5cbfdac

试了下创建简单的表,该sql执行成功

企业微信截图_9b7a328c-13d4-431c-ba3a-d73f0ec08012

mo log:

企业微信截图_1306b80b-bad1-4129-a7e1-6ab99f594a36

mo_log.tar.gz

Expected Behavior

No response

Steps to Reproduce

mo-load工具修改cases/ddl/run.yml文件下sql和vuser
sql: "insert into table_1000w(clo4) values({tbx}) on duplicate key update clo2=clo2/10;"
vuser: 100
执行./start.sh -c cases/ddl/

ddl:
create external table ex_table_1000w(clo1 tinyint,clo2 smallint,clo3 int,clo4 bigint,clo5 tinyint unsigned,clo6 smallint unsigned,clo7 int unsigned,clo8 bigint unsigned,col9 float,col10 double,col11 varchar(255),col12 Date,col13 DateTime,col14 timestamp,col15 bool,col16 decimal(5,2),col17 text,col18 varchar(255),col19 varchar(255),col20 varchar(255))infile{"filepath"='/Users/heni/test_data/10000000_20_columns_load_data.csv'};
create table table_1000w(clo1 tinyint,clo2 smallint,clo3 int,clo4 bigint,clo5 tinyint unsigned,clo6 smallint unsigned,clo7 int unsigned,clo8 bigint unsigned,col9 float,col10 double,col11 varchar(255),col12 Date,col13 DateTime,col14 timestamp,col15 bool,col16 decimal(5,2),col17 text,col18 varchar(255),col19 varchar(255),col20 varchar(255),primary key(clo4));
insert into table_1000w select * from ex_table_1000w;

Additional information

No response

sukki37 commented 10 months ago

The likely cause of the MO crash is due to OOM. Need find the cause for the OOM.

ouyuanning commented 10 months ago

1、似乎测试脚本有点问题。连接到MO会报这个错

{"level":"ERROR","time":"2024/01/03 14:52:16.668663 +0800","caller":"frontend/mysql_cmd_executor.go:315","msg":"error: SQL parser error: You have an error in your SQL syntax; check the manual that corresponds to your MatrixOne server version for the right syntax to use. syntax error at line 1 column 9 near \" $$\";","span":{"trace_id":"f33dd58e-cd53-8a52-58e4-f29d4991fc2c","span_id":"164f42c1b6b06b08"}}

2、但不知道为啥会引起服务器后来出错了。应该跟1关系不太大

nnsgmsone commented 10 months ago

no process

nnsgmsone commented 10 months ago

no process

nnsgmsone commented 10 months ago

no process

nnsgmsone commented 10 months ago

no process

nnsgmsone commented 10 months ago

no process

ouyuanning commented 10 months ago

才留意说的是on duplicate key 算子。 该算子会把数据一致保存在内存中,直到所有数据都被检验一遍,所以大数据量的时候很容易OOM。 得等spill完成后才不会OOM

m-schen commented 9 months ago

1.2估计做不了spill 暂无安排

m-schen commented 8 months ago

暂无具体安排。

目前的计划是给某些算子增加process级别的mpool not enough的检测。避免crash的发生。

m-schen commented 2 weeks ago

等Join的spill再转送该issue。

m-schen commented 2 weeks ago

spill需要延期到2.1,2.0.1无法完成开发。

m-schen commented 1 week ago

no words.

m-schen commented 6 days ago

不确定具体是什么导致的oom, 2.1完成spill后再进行验证。

m-schen commented 1 day ago

同上