matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 276 forks source link

[Bug]: delete from table panic. #19384

Closed Ariznawlll closed 1 month ago

Ariznawlll commented 1 month ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

39ba16c

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

tke环境:

image

log:https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22iwM%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-big-data-20241015%5C%22%7D%20%7C%3D%20%60panic%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221729072196000%22,%22to%22:%221729072436000%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

分布式环境执行:

create database test;
use test;
create table t(a int);
insert into t values (1),(2);
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;
delete from t where a=1;

Additional information

No response

aressu1985 commented 1 month ago

@jensenojs @ouyuanning 二位看下这个问题,检查下之前的优化方案是否还存在问题

ouyuanning commented 1 month ago

锦赛看一下为什么没有转表锁

gouhongshen commented 1 month ago
mysql> delete from t where a=1;
ERROR 20101 (HY000): internal error: panic too large fixed slice 33554432, max is 16777216: 
github.com/matrixorigin/matrixone/pkg/lockservice.(*fixedSlicePool).acquire
        /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/slice.go:219
github.com/matrixorigin/matrixone/pkg/lockservice.(*cowSlice).append
        /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/slice.go:65
github.com/matrixorigin/matrixone/pkg/lockservice.(*activeTxn).lockAdded
        /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/txn.go:121
github.c
mysql> 
mysql> 
mysql> 
mysql> explain verbose delete from t where a=1;
+-------------------------------------------------------------------------------------------------------------+
| TP QUERY PLAN                                                                                               |
+-------------------------------------------------------------------------------------------------------------+
| Delete[4] on test.t (cost=4096.00 outcnt=4096.00 selectivity=0.5000 blockNum=2)                             |
|   ->  Lock[3] (cost=4096.00 outcnt=4096.00 selectivity=0.5000 blockNum=2)                                   |
|         Lock level: Row level lock                                                                          |
|         ->  Project[2] (cost=4096.00 outcnt=4096.00 selectivity=0.5000 blockNum=2)                          |
|               Output: a, __mo_fake_pk_col, __mo_rowid                                                       |
|               ->  Project[1] (cost=4096.00 outcnt=4096.00 selectivity=0.5000 blockNum=2)                    |
|                     Output: t.a, t.__mo_fake_pk_col, t.__mo_rowid                                           |
|                     ->  Table Scan[0] on test.t (cost=8192.00 outcnt=4096.00 selectivity=0.5000 blockNum=2) |
|                           Output: t.a, t.__mo_fake_pk_col, t.__mo_rowid                                     |
|                           Table: 't' (0:'a', 1:'__mo_fake_pk_col', 2:'__mo_rowid')                          |
|                           Filter Cond: (t.a = 1)                                                            |
+-------------------------------------------------------------------------------------------------------------+
11 rows in set (0.00 sec)

mysql> delete from t where a=1;
ERROR 20101 (HY000): internal error: panic too large fixed slice 33554432, max is 16777216: 
github.com/matrixorigin/matrixone/pkg/lockservice.(*fixedSlicePool).acquire
        /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/slice.go:219
github.com/matrixorigin/matrixone/pkg/lockservice.(*cowSlice).append
        /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/slice.go:65
github.com/matrixorigin/matrixone/pkg/lockservice.(*activeTxn).lockAdded
        /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/txn.go:121
github.c
jensenojs commented 1 month ago

我在本地的mac, 深圳pc, tke三套环境上测试了upstream/main, 以及39ba16c5a的代码, 还没有成功复现这个issue. 第一次在tke上跑, 跑出来了一个别的问题, 除此之外还没有跑出来fail的情况

Pasted Graphic 7

向QA同学请教了一下, 她跑了三次, 一次panic 一次oom 一次r-w conflict, 不一样的错误信息

红深哥好像很容易碰到这个问题, 能看到确实是没有及时地把行锁转换为表锁

gouhongshen commented 1 month ago

本地 mac, 129, binary search 按照上面的步骤均能稳定复现,默认配置。

不过,修改 lock slice max 的参数可以跳过该 panic

    MaxFixedSliceSize toml.ByteSize `toml:"max-fixed-slice-size"`
jensenojs commented 1 month ago

我和下面这个issue一起做了

heni02 commented 1 month ago

insert into table select from table大数据量也比较容易出现 commit:72b1061 mysql> insert into test01(b) select b from test01 where a<200000000; ERROR 20101 (HY000): internal error: panic too large fixed slice 33554432, max is 16777216: github.com/matrixorigin/matrixone/pkg/lockservice.(fixedSlicePool).acquire /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/slice.go:219 github.com/matrixorigin/matrixone/pkg/lockservice.(cowSlice).append /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/slice.go:65 github.com/matrixorigin/matrixone/pkg/lockservice.(*activeTxn).lockAdded /go/src/github.com/matrixorigin/matrixone/pkg/lockservice/txn.go:121 github.c

企业微信截图_c9cf9930-e890-48da-aca6-b24fbf2df3f4
aressu1985 commented 1 month ago

当前2.0版本,目前只能保持这种状态

jensenojs commented 1 month ago

这个pr能根治这种panic

Ariznawlll commented 1 month ago

testing

Ariznawlll commented 1 month ago

分布式环境测试:

场景一:执行了三次,第三次报错r-w conflict

mysql> select git_version();
+---------------+
| git_version() |
+---------------+
| d6f4467       |
+---------------+
1 row in set (0.00 sec)

create database test;
use test;
create table t(a int);
insert into t values (1),(2);
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
select count(*) from t;
delete from t where a=1;
image

场景二:执行了三次,均成功

create table  if not exists big_data_test.table_with_com_pk_index_for_load_100M( id bigint auto_increment, col1 tinyint, col2 smallint, col3 int, col4 bigint, col5 tinyint unsigned, col6 smallint unsigned, col7 int unsigned, col8 bigint unsigned, col9 float, col10 double, col11 varchar(255), col12 Date, col13 DateTime, col14 timestamp, col15 bool, col16 decimal(16,6), col17 text, col18 json, col19 blob, col20 binary(255), col21 varbinary(255), col22 vecf32(3), col23 vecf32(3), col24 vecf64(3), col25 vecf64(3), primary key (id, col1), key(col3,col4), unique key(col4) );

load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761','filepath'='mo-big-data/100000000_20_columns_load_data_pk.csv'} into table big_data_test.table_with_com_pk_index_for_load_100M fields terminated by '|' lines terminated by '\n' ignore 1 lines parallel 'true';

mysql> show create table big_data_test.table_with_com_pk_index_for_load_100M;
+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table                                 | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| table_with_com_pk_index_for_load_100m | CREATE TABLE `table_with_com_pk_index_for_load_100m` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `col1` tinyint NOT NULL,
  `col2` smallint DEFAULT NULL,
  `col3` int DEFAULT NULL,
  `col4` bigint DEFAULT NULL,
  `col5` tinyint unsigned DEFAULT NULL,
  `col6` smallint unsigned DEFAULT NULL,
  `col7` int unsigned DEFAULT NULL,
  `col8` bigint unsigned DEFAULT NULL,
  `col9` float DEFAULT NULL,
  `col10` double DEFAULT NULL,
  `col11` varchar(255) DEFAULT NULL,
  `col12` date DEFAULT NULL,
  `col13` datetime DEFAULT NULL,
  `col14` timestamp NULL DEFAULT NULL,
  `col15` bool DEFAULT NULL,
  `col16` decimal(16,6) DEFAULT NULL,
  `col17` text DEFAULT NULL,
  `col18` json DEFAULT NULL,
  `col19` blob DEFAULT NULL,
  `col20` binary(255) DEFAULT NULL,
  `col21` varbinary(255) DEFAULT NULL,
  `col22` vecf32(3) DEFAULT NULL,
  `col23` vecf32(3) DEFAULT NULL,
  `col24` vecf64(3) DEFAULT NULL,
  `col25` vecf64(3) DEFAULT NULL,
  PRIMARY KEY (`id`,`col1`),
  UNIQUE KEY `col4` (`col4`),
  KEY `col3` (`col3`,`col4`)
) |
+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from big_data_test.table_with_com_pk_index_for_load_100M;
+-----------+
| count(*)  |
+-----------+
| 100000000 |
+-----------+
1 row in set (0.05 sec)

mysql> create table test01 (col1 smallint DEFAULT NULL);
Query OK, 0 rows affected (0.15 sec)

mysql> insert into test01 select(col2) from big_data_test.table_with_com_pk_index_for_load_100M where col1 < 100000000;
Query OK, 100000000 rows affected (28.34 sec)
image

结论:测试过程均未出现panic