matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.78k stars 275 forks source link

[Bug]: insert into values duplicate key update 100threads mo crashed #14038

Open heni02 opened 9 months ago

heni02 commented 9 months ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

4531180d4b20c179baa426cca179aaebc70cec84

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

原表数据量为1050万有主键,并发100,刚开始执行mo 挂掉了,但数据量为1050万无主键并发100测试mo正常 sql:insert into table_1000w(clo4) values({tbx}) on duplicate key update clo2=clo2/10;

企业微信截图_1167f74c-a532-441c-9ae2-9e015b05c4ef 企业微信截图_9eda5e40-ee3d-4bd6-aa68-827da5cbfdac

试了下创建简单的表,该sql执行成功

企业微信截图_9b7a328c-13d4-431c-ba3a-d73f0ec08012

mo log:

企业微信截图_1306b80b-bad1-4129-a7e1-6ab99f594a36

mo_log.tar.gz

Expected Behavior

No response

Steps to Reproduce

mo-load工具修改cases/ddl/run.yml文件下sql和vuser
sql: "insert into table_1000w(clo4) values({tbx}) on duplicate key update clo2=clo2/10;"
vuser: 100
执行./start.sh -c cases/ddl/

ddl:
create external table ex_table_1000w(clo1 tinyint,clo2 smallint,clo3 int,clo4 bigint,clo5 tinyint unsigned,clo6 smallint unsigned,clo7 int unsigned,clo8 bigint unsigned,col9 float,col10 double,col11 varchar(255),col12 Date,col13 DateTime,col14 timestamp,col15 bool,col16 decimal(5,2),col17 text,col18 varchar(255),col19 varchar(255),col20 varchar(255))infile{"filepath"='/Users/heni/test_data/10000000_20_columns_load_data.csv'};
create table table_1000w(clo1 tinyint,clo2 smallint,clo3 int,clo4 bigint,clo5 tinyint unsigned,clo6 smallint unsigned,clo7 int unsigned,clo8 bigint unsigned,col9 float,col10 double,col11 varchar(255),col12 Date,col13 DateTime,col14 timestamp,col15 bool,col16 decimal(5,2),col17 text,col18 varchar(255),col19 varchar(255),col20 varchar(255),primary key(clo4));
insert into table_1000w select * from ex_table_1000w;

Additional information

No response

sukki37 commented 9 months ago

The likely cause of the MO crash is due to OOM. Need find the cause for the OOM.

ouyuanning commented 9 months ago

1、似乎测试脚本有点问题。连接到MO会报这个错

{"level":"ERROR","time":"2024/01/03 14:52:16.668663 +0800","caller":"frontend/mysql_cmd_executor.go:315","msg":"error: SQL parser error: You have an error in your SQL syntax; check the manual that corresponds to your MatrixOne server version for the right syntax to use. syntax error at line 1 column 9 near \" $$\";","span":{"trace_id":"f33dd58e-cd53-8a52-58e4-f29d4991fc2c","span_id":"164f42c1b6b06b08"}}

2、但不知道为啥会引起服务器后来出错了。应该跟1关系不太大

nnsgmsone commented 9 months ago

no process

nnsgmsone commented 9 months ago

no process

nnsgmsone commented 9 months ago

no process

nnsgmsone commented 8 months ago

no process

nnsgmsone commented 8 months ago

no process

ouyuanning commented 8 months ago

才留意说的是on duplicate key 算子。 该算子会把数据一致保存在内存中,直到所有数据都被检验一遍,所以大数据量的时候很容易OOM。 得等spill完成后才不会OOM

m-schen commented 8 months ago

spill相关的工作均未开始

m-schen commented 7 months ago

1.2估计做不了spill 暂无安排

m-schen commented 7 months ago

暂无具体安排。

目前的计划是给某些算子增加process级别的mpool not enough的检测。避免crash的发生。

m-schen commented 1 day ago

与上一个评论相同,正在做spill。