matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.78k stars 276 forks source link

[Bug]: CN lost connection #9647

Open fengttt opened 1 year ago

fengttt commented 1 year ago

Is there an existing issue for the same bug?

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93):
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

Running same test in #9646, but with a better machine (32G to 64G memory), you will see cn lost connection.

ERROR 20503 (HY000) at line 40: stream closed

Expected Behavior

No response

Steps to Reproduce

No response

Additional information

No response

volgariver6 commented 1 year ago

in local test:

create database if not exists db1;

use db1

drop table if exists t;

create table t (i int, j int);

insert into t values (1, 1), (2, 2), (3, 3), (4, 4), (5, null), (null, 5);

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;

delete from t where i = 1;
delete from t where i = 2;
select count(*) from t;

insert into t select * from t;
select count(*) from t;

above script will result in: 2023/05/24 12:21:17.672100 +0800 ERROR logservicedriver/appender.go:70 append failed: internal error: message body 116720140 is too large, max is 104857600

volgariver6 commented 1 year ago

dup with #9447

volgariver6 commented 1 year ago

该问题的分析见https://github.com/matrixorigin/matrixone/issues/9447#issuecomment-1576675658

后续由 @triump2020 再做一些优化。

volgariver6 commented 1 year ago

ERROR 20503 (HY000) at line 40: stream closed

出现该错误或者其他的连接关闭的错误的原因是,rpc框架中的gc任务会检查每个连接的活跃状态,当超过一定时间(默认1分钟)没有数据时,就会关闭这个连接。

delete语句的commit时间长,导致其连接被gc给close了,所以执行失败,需要优化。

triump2020 commented 1 year ago

depends on #9996

XuPeng-SH commented 1 year ago

@jiangxinmeng1 delete related issue

XuPeng-SH commented 1 year ago

10418 cannot fix this issue. This depends on relatively significant refactoring.

jiangxinmeng1 commented 1 year ago

还没有进展

triump2020 commented 1 year ago

depends on V1.1

triump2020 commented 1 year ago

depends on V1.1

XuPeng-SH commented 1 year ago

it depends on #11805 #11804

XuPeng-SH commented 4 months ago

depends on #11471

jiangxinmeng1 commented 4 months ago

depends on https://github.com/matrixorigin/matrixone/issues/11805 https://github.com/matrixorigin/matrixone/issues/11804

XuPeng-SH commented 3 weeks ago

fixed