matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.71k stars 271 forks source link

[Bug]: performance which load flate and gzip tar file is 6 or 7 times slower than uncompressed files #16437

Open heni02 opened 1 month ago

heni02 commented 1 month ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

8a5f0bd44

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

flate,gzip压缩文件load相比非压缩文件load耗时慢6-7倍 job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9256599329/job/25472957901

企业微信截图_ef84fbd7-201a-4d7c-bbb7-16b578a27a33

4千万同样schema和数据,非压缩load只耗时4.4min 1千万同样schema和数据,非压缩load只耗时2.6min

load flate文件时profile:

企业微信截图_6c5d7bae-8f21-418d-9c23-73ce96b37d48

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22GWK%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240527%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D,%22profileTypeId%22:%22process_cpu:cpu:nanoseconds:cpu:nanoseconds%22%7D%5D,%22range%22:%7B%22from%22:%221716843420000%22,%22to%22:%221716845460000%22%7D%7D%7D&schemaVersion=1&orgId=1

load gzip文件时profile: image https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22GWK%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240527%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D,%22profileTypeId%22:%22process_cpu:cpu:nanoseconds:cpu:nanoseconds%22%7D%5D,%22range%22:%7B%22from%22:%221716845460000%22,%22to%22:%221716846480000%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

ddl:
create table table_100_columns(
clo1 tinyint,
clo2 smallint,
clo3 int,
clo4 bigint,
clo5 tinyint unsigned,
clo6 smallint unsigned,
clo7 int unsigned,
clo8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(5,2),
col17 text,
col18 varchar(255),
col19 varchar(255),
col20 varchar(255),
col21 varchar(255),
col22 varchar(255),
col23 varchar(255),
col24 varchar(255),
col25 varchar(255),
col26 varchar(255),
col27 varchar(255),
col28 varchar(255),
col29 varchar(255),
col30 varchar(255),
col31 varchar(255),
col32 varchar(255),
col33 varchar(255),
col34 varchar(255),
col35 varchar(255),
col36 varchar(255),
col37 varchar(255),
col38 varchar(255),
col39 varchar(255),
col40 varchar(255),
col41 varchar(255),
col42 varchar(255),
col43 varchar(255),
col44 varchar(255),
col45 varchar(255),
col46 varchar(255),
col47 varchar(255),
col48 varchar(255),
col49 varchar(255),
col50 varchar(255),
col51 varchar(255),
col52 varchar(255),
col53 varchar(255),
col54 varchar(255),
col55 varchar(255),
col56 varchar(255),
col57 varchar(255),
col58 varchar(255),
col59 varchar(255),
col60 varchar(255),
col61 varchar(255),
col62 varchar(255),
col63 varchar(255),
col64 varchar(255),
col65 varchar(255),
col66 varchar(255),
col67 varchar(255),
col68 varchar(255),
col69 varchar(255),
col70 varchar(255),
col71 varchar(255),
col72 varchar(255),
col73 varchar(255),
col74 varchar(255),
col75 varchar(255),
col76 varchar(255),
col77 varchar(255),
col78 varchar(255),
col79 varchar(255),
col80 varchar(255),
col81 varchar(255),
col82 varchar(255),
col83 varchar(255),
col84 varchar(255),
col85 varchar(255),
col86 varchar(255),
col87 varchar(255),
col88 varchar(255),
col89 varchar(255),
col90 varchar(255),
col91 varchar(255),
col92 varchar(255),
col93 varchar(255),
col94 varchar(255),
col95 varchar(255),
col96 varchar(255),
col97 varchar(255),
col98 varchar(255),
col99 varchar(255),
col100 varchar(255)
);
load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='compressed_file/40000000_100_columns_load_data.flate', 'compression'='flate'} into table test.table_100_columns fields terminated by ',' lines terminated by '\n' parallel 'true';

create table table_200_columns(
clo1 tinyint,
clo2 smallint,
clo3 int,
clo4 bigint,
clo5 tinyint unsigned,
clo6 smallint unsigned,
clo7 int unsigned,
clo8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(5, 2),
col17 text,
col18 varchar(225),
col19 varchar(225),
col20 varchar(225),
col21 varchar(225),
col22 varchar(225),
col23 varchar(225),
col24 varchar(225),
col25 varchar(225),
col26 varchar(225),
col27 varchar(225),
col28 varchar(225),
col29 varchar(225),
col30 varchar(225),
col31 varchar(225),
col32 varchar(225),
col33 varchar(225),
col34 varchar(225),
col35 varchar(225),
col36 varchar(225),
col37 varchar(225),
col38 varchar(225),
col39 varchar(225),
col40 varchar(225),
col41 varchar(225),
col42 varchar(225),
col43 varchar(225),
col44 varchar(225),
col45 varchar(225),
col46 varchar(225),
col47 varchar(225),
col48 varchar(225),
col49 varchar(225),
col50 varchar(225),
col51 varchar(225),
col52 varchar(225),
col53 varchar(225),
col54 varchar(225),
col55 varchar(225),
col56 varchar(225),
col57 varchar(225),
col58 varchar(225),
col59 varchar(225),
col60 varchar(225),
col61 varchar(225),
col62 varchar(225),
col63 varchar(225),
col64 varchar(225),
col65 varchar(225),
col66 varchar(225),
col67 varchar(225),
col68 varchar(225),
col69 varchar(225),
col70 varchar(225),
col71 varchar(225),
col72 varchar(225),
col73 varchar(225),
col74 varchar(225),
col75 varchar(225),
col76 varchar(225),
col77 varchar(225),
col78 varchar(225),
col79 varchar(225),
col80 varchar(225),
col81 varchar(225),
col82 varchar(225),
col83 varchar(225),
col84 varchar(225),
col85 varchar(225),
col86 varchar(225),
col87 varchar(225),
col88 varchar(225),
col89 varchar(225),
col90 varchar(225),
col91 varchar(225),
col92 varchar(225),
col93 varchar(225),
col94 varchar(225),
col95 varchar(225),
col96 varchar(225),
col97 varchar(225),
col98 varchar(225),
col99 varchar(225),
col100 varchar(225),
col101 varchar(225),
col102 varchar(225),
col103 varchar(225),
col104 varchar(225),
col105 varchar(225),
col106 varchar(225),
col107 varchar(225),
col108 varchar(225),
col109 varchar(225),
col110 varchar(225),
col111 varchar(225),
col112 varchar(225),
col113 varchar(225),
col114 varchar(225),
col115 varchar(225),
col116 varchar(225),
col117 varchar(225),
col118 varchar(225),
col119 varchar(225),
col120 varchar(225),
col121 varchar(225),
col122 varchar(225),
col123 varchar(225),
col124 varchar(225),
col125 varchar(225),
col126 varchar(225),
col127 varchar(225),
col128 varchar(225),
col129 varchar(225),
col130 varchar(225),
col131 varchar(225),
col132 varchar(225),
col133 varchar(225),
col134 varchar(225),
col135 varchar(225),
col136 varchar(225),
col137 varchar(225),
col138 varchar(225),
col139 varchar(225),
col140 varchar(225),
col141 varchar(225),
col142 varchar(225),
col143 varchar(225),
col144 varchar(225),
col145 varchar(225),
col146 varchar(225),
col147 varchar(225),
col148 varchar(225),
col149 varchar(225),
col150 varchar(225),
col151 varchar(225),
col152 varchar(225),
col153 varchar(225),
col154 varchar(225),
col155 varchar(225),
col156 varchar(225),
col157 varchar(225),
col158 varchar(225),
col159 varchar(225),
col160 varchar(225),
col161 varchar(225),
col162 varchar(225),
col163 varchar(225),
col164 varchar(225),
col165 varchar(225),
col166 varchar(225),
col167 varchar(225),
col168 varchar(225),
col169 varchar(225),
col170 varchar(225),
col171 varchar(225),
col172 varchar(225),
col173 varchar(225),
col174 varchar(225),
col175 varchar(225),
col176 varchar(225),
col177 varchar(225),
col178 varchar(225),
col179 varchar(225),
col180 varchar(225),
col181 varchar(225),
col182 varchar(225),
col183 varchar(225),
col184 varchar(225),
col185 varchar(225),
col186 varchar(225),
col187 varchar(225),
col188 varchar(225),
col189 varchar(225),
col190 varchar(225),
col191 varchar(225),
col192 varchar(225),
col193 varchar(225),
col194 varchar(225),
col195 varchar(225),
col196 varchar(225),
col197 varchar(225),
col198 varchar(225),
col199 varchar(225),
col200 varchar(225)
);
load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='compressed_file/10000000_200_columns_load_data.csv.gz', 'compression'='gzip'} into table test.table_200_columns fields terminated by ',' lines terminated by '\n' parallel 'true';

Additional information

No response

jensenojs commented 1 month ago

不知道pprof能保留多久, 先截个图

image
jensenojs commented 1 month ago

等重新设计方案

jensenojs commented 1 week ago

not working on it

jensenojs commented 6 days ago

not working on it

jensenojs commented 1 day ago

not working on it