Open jensenojs opened 1 month ago
在我给出的相关复现sql中, 观测到后台SQL的执行次数是有些问题的. 得看一下
update : 这个问题是由于auto incr
的cache
一次只拿一万的区间. 不过接近两万次后台SQL调用还是不正常, 但不见得和本issue相关
需要二分的范围
06cd080dd remove catalog operation log from checkpoint and replay TN from the three-table data (#18578)
949c826ba [bug] fix ut TestNodeGossip (#18882)
d22360d7c [#15201]Remove BVT tag (#18876)
514254474 tune ap memory cache policy (#18852)
76ea1a2b7 Add multi update operator (#18845)
8e3fffacb remove InefficientMustStrCol in merge&sort. (#18868)
0010ae8f0 add simple object reader for s3 transfer and gc optimization (#18867)
edca9ef16 [fix] : add DebugTool for executor, add test cases, fix some bugs (#18828)
50f45ea9f fix TestChangesHandle3 (#18861)
c87bcdaa1 mo-service: fix go build tag (#18851)
786f7bb50 add case for datalink (#18846)
ab7363aba Add policyCompact (#18720)
f666bc0ed fix error info (#18836)
6141ccae3 fix ndv calculation to make it more accurate (#18847)
afd7c5337 Fix ckp (#18825)
ff41a9b5c fix sql generation of show cdc task (#18830)
c678c5a69 delete time Index for tombstone objects in partition state. (#18832)
4be8e6312 make internal sql executor support partition table (#18841)
717cb94ed add MergeSortBatches function for future code reuse (#18840)
11e40a926 fix stats for scan with limit (#18824)
4e1a2f88a Adding a jsonbyte fulltext tokenizer. (#18740)
5456f6140 Stage provider support amazon, minio and COS (#18839)
e2a5a4bac fileservice: dynamic cache size (#18652)
9abebe71d accelerate build table def when limit 0 (#18707)
insert into select性能下降:
出现问题的commit在 06cd080dd6c3384d1a19f4de6a0083e152f44b96和9abebe71d0d5059d21466083a1077fdcfe4d397e之间
测试场景1:
create table if not exists big_data_test.table_with_pk_for_load_1B( id bigint primary key, col1 tinyint, col2 smallint, col3 int, col4 bigint, col5 tinyint unsigned, col6 smallint unsigned, col7 int unsigned, col8 bigint unsigned, col9 float, col10 double, col11 varchar(255), col12 Date, col13 DateTime, col14 timestamp, col15 bool, col16 decimal(16,6), col17 text, col18 json, col19 blob, col20 binary(255), col21 varbinary(255), col22 vecf32(3), col23 vecf32(3), col24 vecf64(3), col25 vecf64(3) );
load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com/','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='mo-big-data/1000000000_20_columns_load_data_pk.csv'} into table big_data_test.table_with_pk_for_load_1B fields terminated by '|' lines terminated by '\n' ignore 1 lines parallel 'true';
insert into big_data_test.table_with_pk_index_for_insert_1B(id,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25) select * from big_data_test.table_with_pk_for_load_1B where col4 != -7508478199581380391;
测试场景2:
create table if not exists big_data_test.table_with_pk_for_load_1B( id bigint primary key, col1 tinyint, col2 smallint, col3 int, col4 bigint, col5 tinyint unsigned, col6 smallint unsigned, col7 int unsigned, col8 bigint unsigned, col9 float, col10 double, col11 varchar(255), col12 Date, col13 DateTime, col14 timestamp, col15 bool, col16 decimal(16,6), col17 text, col18 json, col19 blob, col20 binary(255), col21 varbinary(255), col22 vecf32(3), col23 vecf32(3), col24 vecf64(3), col25 vecf64(3) );
load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com/','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='mo-big-data/1000000000_20_columns_load_data_pk.csv'} into table big_data_test.table_with_pk_for_load_1B fields terminated by '|' lines terminated by '\n' ignore 1 lines parallel 'true';
insert into big_data_test.table_with_com_pk_index_for_insert_1B(id,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25) select * from big_data_test.table_with_pk_for_load_1B where col4 != -7508478199581380391;
ak sk会有安全性问题,单独联系我
深圳机器上尝试复现失败, 现在的性能跟踪情况如下
https://grafana.ci.matrixorigin.cn/d/aa33449a-8c5e-472f-8781-80d4c5df2544/insert-into-select?orgId=1
先解决这个.
还是慢, 看看怎么回事
初步确定边界范围 :
06cd080dd
通过insert的方式在tke上耗时2 min 29.44 sec
9abebe71d
通过insert的方式在tke上耗时 1 min 50.34 sec
但是进入二分之后, 发现上面的区别可能是波动导致的.
06cd080dd remove catalog operation log from checkpoint and replay ... 2 min 1
949c826ba [bug] fix ut TestNodeGossip (#18882)
d22360d7c [#15201]Remove BVT tag (#18876)
514254474 tune ap memory cache policy (#18852) 1 min 58
76ea1a2b7 Add multi update operator (#18845)
8e3fffacb remove InefficientMustStrCol in merge&sort. (#18868)
0010ae8f0 add simple object reader for s3 transfer and gc optimization .. 1 min 55
复现的sql是
drop database if exists test; create database test; use test;
CREATE TABLE `metric` (
`metric_name` varchar(1024) DEFAULT 'sys' COMMENT 'metric name, like: sql_statement_total, server_connections, process_cpu_percent, sys_memory_used, ...',
`collecttime` datetime(6) NOT NULL COMMENT 'metric data collect time',
`value` double DEFAULT '0.0' COMMENT 'metric value',
`node` varchar(1024) DEFAULT 'monolithic' COMMENT 'mo node uuid',
`role` varchar(1024) DEFAULT 'monolithic' COMMENT 'mo node role, like: CN, DN, LOG',
`account` varchar(1024) DEFAULT 'sys' COMMENT 'account name',
`type` varchar(1024) NOT NULL COMMENT 'sql type, like: insert, select, ...'
) COMMENT='metric data[mo_no_del_hint]' CLUSTER BY (`collecttime`, `metric_name`, `account`);
insert into metric select
"metric_name_" || (result % 22),
date_add('2024-08-01 00:00:00', interval cast(result / 1000 as int) SECOND),
result,
"node_" || (result % 10),
"role_" || (result % 3),
"account_" || (result % 100),
"type_" || (result % 10)
from generate_series(1,500000000) g;
生成数据大概要小半小时, 然后insert into select
来复制数据, 这种复现方式在ci的tke环境上跑的.
本地再尝试用multi-cn-with-proxy的方式去复现, 看看能不能复现出来, 如果不能的话可能还是需要测试同学帮忙协调一下资源进行二分
main上的性能不是已经正常了吗
下个版本解决
辛苦魏璐帮忙二分看看吧。目前锦赛这边没有复现成功
辛苦魏璐帮忙二分看看吧。目前锦赛这边没有复现成功
远宁哥,这个每天都有测试,也有profile什么的,二分跑一次要很长时间(先load data再insert into select),可以加点什么指标再跑跑?
测试中
insert into select性能下降:
出现问题的commit在 06cd080dd6c3384d1a19f4de6a0083e152f44b96和9abebe71d0d5059d21466083a1077fdcfe4d397e之间
测试场景1:
create table if not exists big_data_test.table_with_pk_for_load_1B( id bigint primary key, col1 tinyint, col2 smallint, col3 int, col4 bigint, col5 tinyint unsigned, col6 smallint unsigned, col7 int unsigned, col8 bigint unsigned, col9 float, col10 double, col11 varchar(255), col12 Date, col13 DateTime, col14 timestamp, col15 bool, col16 decimal(16,6), col17 text, col18 json, col19 blob, col20 binary(255), col21 varbinary(255), col22 vecf32(3), col23 vecf32(3), col24 vecf64(3), col25 vecf64(3) ); load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com/','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='mo-big-data/1000000000_20_columns_load_data_pk.csv'} into table big_data_test.table_with_pk_for_load_1B fields terminated by '|' lines terminated by '\n' ignore 1 lines parallel 'true'; insert into big_data_test.table_with_pk_index_for_insert_1B(id,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25) select * from big_data_test.table_with_pk_for_load_1B where col4 != -7508478199581380391;
测试场景2:
create table if not exists big_data_test.table_with_pk_for_load_1B( id bigint primary key, col1 tinyint, col2 smallint, col3 int, col4 bigint, col5 tinyint unsigned, col6 smallint unsigned, col7 int unsigned, col8 bigint unsigned, col9 float, col10 double, col11 varchar(255), col12 Date, col13 DateTime, col14 timestamp, col15 bool, col16 decimal(16,6), col17 text, col18 json, col19 blob, col20 binary(255), col21 varbinary(255), col22 vecf32(3), col23 vecf32(3), col24 vecf64(3), col25 vecf64(3) ); load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com/','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='mo-big-data/1000000000_20_columns_load_data_pk.csv'} into table big_data_test.table_with_pk_for_load_1B fields terminated by '|' lines terminated by '\n' ignore 1 lines parallel 'true'; insert into big_data_test.table_with_com_pk_index_for_insert_1B(id,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,col21,col22,col23,col24,col25) select * from big_data_test.table_with_pk_for_load_1B where col4 != -7508478199581380391;
ak sk会有安全性问题,单独联系我
单独把insert into这个步骤 拿出来测试,性能波动在正常范围内
分析最近的性能图,性能波动比较大,需要赛赛再帮忙看下图上圈出时间的测试期间的指标细节@jensenojs
11.06测试namespace:mo-big-data-20241104 11.07测试namespace:mo-big-data-nightly-173985a-20241106
Is there an existing issue for the same bug?
Branch Name
main
Commit ID
lastest
Other Environment Information
Actual Behavior
执行如上sql, 运行过程中会failed
[1] + 17948 killed ./mo-service -debug-http :9876 -launch ./etc/launch/launch.toml > 2>&1 ERROR 2013 (HY000): Lost connection to MySQL server during query No connection. Trying to reconnect... ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1:6001' (61) ERROR: Can't connect to the server
mysql>
@Ariznawlll 辛苦QA同学补充相关现象
Expected Behavior
No response
Steps to Reproduce
Additional information
No response