matrixorigin / matrixone

Hyperconverged cloud-edge native database
https://docs.matrixorigin.cn/en
Apache License 2.0
1.79k stars 277 forks source link

[Bug]: performance which load flate and gzip tar file is 6 or 7 times slower than uncompressed files #16437

Open heni02 opened 5 months ago

heni02 commented 5 months ago

Is there an existing issue for the same bug?

Branch Name

main

Commit ID

8a5f0bd44

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

flate,gzip压缩文件load相比非压缩文件load耗时慢6-7倍 job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9256599329/job/25472957901

企业微信截图_ef84fbd7-201a-4d7c-bbb7-16b578a27a33

4千万同样schema和数据,非压缩load只耗时4.4min 1千万同样schema和数据,非压缩load只耗时2.6min

load flate文件时profile:

企业微信截图_6c5d7bae-8f21-418d-9c23-73ce96b37d48

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22GWK%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240527%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D,%22profileTypeId%22:%22process_cpu:cpu:nanoseconds:cpu:nanoseconds%22%7D%5D,%22range%22:%7B%22from%22:%221716843420000%22,%22to%22:%221716845460000%22%7D%7D%7D&schemaVersion=1&orgId=1

load gzip文件时profile: image https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22GWK%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240527%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D,%22profileTypeId%22:%22process_cpu:cpu:nanoseconds:cpu:nanoseconds%22%7D%5D,%22range%22:%7B%22from%22:%221716845460000%22,%22to%22:%221716846480000%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

ddl:
create table table_100_columns(
clo1 tinyint,
clo2 smallint,
clo3 int,
clo4 bigint,
clo5 tinyint unsigned,
clo6 smallint unsigned,
clo7 int unsigned,
clo8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(5,2),
col17 text,
col18 varchar(255),
col19 varchar(255),
col20 varchar(255),
col21 varchar(255),
col22 varchar(255),
col23 varchar(255),
col24 varchar(255),
col25 varchar(255),
col26 varchar(255),
col27 varchar(255),
col28 varchar(255),
col29 varchar(255),
col30 varchar(255),
col31 varchar(255),
col32 varchar(255),
col33 varchar(255),
col34 varchar(255),
col35 varchar(255),
col36 varchar(255),
col37 varchar(255),
col38 varchar(255),
col39 varchar(255),
col40 varchar(255),
col41 varchar(255),
col42 varchar(255),
col43 varchar(255),
col44 varchar(255),
col45 varchar(255),
col46 varchar(255),
col47 varchar(255),
col48 varchar(255),
col49 varchar(255),
col50 varchar(255),
col51 varchar(255),
col52 varchar(255),
col53 varchar(255),
col54 varchar(255),
col55 varchar(255),
col56 varchar(255),
col57 varchar(255),
col58 varchar(255),
col59 varchar(255),
col60 varchar(255),
col61 varchar(255),
col62 varchar(255),
col63 varchar(255),
col64 varchar(255),
col65 varchar(255),
col66 varchar(255),
col67 varchar(255),
col68 varchar(255),
col69 varchar(255),
col70 varchar(255),
col71 varchar(255),
col72 varchar(255),
col73 varchar(255),
col74 varchar(255),
col75 varchar(255),
col76 varchar(255),
col77 varchar(255),
col78 varchar(255),
col79 varchar(255),
col80 varchar(255),
col81 varchar(255),
col82 varchar(255),
col83 varchar(255),
col84 varchar(255),
col85 varchar(255),
col86 varchar(255),
col87 varchar(255),
col88 varchar(255),
col89 varchar(255),
col90 varchar(255),
col91 varchar(255),
col92 varchar(255),
col93 varchar(255),
col94 varchar(255),
col95 varchar(255),
col96 varchar(255),
col97 varchar(255),
col98 varchar(255),
col99 varchar(255),
col100 varchar(255)
);
load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='compressed_file/40000000_100_columns_load_data.flate', 'compression'='flate'} into table test.table_100_columns fields terminated by ',' lines terminated by '\n' parallel 'true';

create table table_200_columns(
clo1 tinyint,
clo2 smallint,
clo3 int,
clo4 bigint,
clo5 tinyint unsigned,
clo6 smallint unsigned,
clo7 int unsigned,
clo8 bigint unsigned,
col9 float,
col10 double,
col11 varchar(255),
col12 Date,
col13 DateTime,
col14 timestamp,
col15 bool,
col16 decimal(5, 2),
col17 text,
col18 varchar(225),
col19 varchar(225),
col20 varchar(225),
col21 varchar(225),
col22 varchar(225),
col23 varchar(225),
col24 varchar(225),
col25 varchar(225),
col26 varchar(225),
col27 varchar(225),
col28 varchar(225),
col29 varchar(225),
col30 varchar(225),
col31 varchar(225),
col32 varchar(225),
col33 varchar(225),
col34 varchar(225),
col35 varchar(225),
col36 varchar(225),
col37 varchar(225),
col38 varchar(225),
col39 varchar(225),
col40 varchar(225),
col41 varchar(225),
col42 varchar(225),
col43 varchar(225),
col44 varchar(225),
col45 varchar(225),
col46 varchar(225),
col47 varchar(225),
col48 varchar(225),
col49 varchar(225),
col50 varchar(225),
col51 varchar(225),
col52 varchar(225),
col53 varchar(225),
col54 varchar(225),
col55 varchar(225),
col56 varchar(225),
col57 varchar(225),
col58 varchar(225),
col59 varchar(225),
col60 varchar(225),
col61 varchar(225),
col62 varchar(225),
col63 varchar(225),
col64 varchar(225),
col65 varchar(225),
col66 varchar(225),
col67 varchar(225),
col68 varchar(225),
col69 varchar(225),
col70 varchar(225),
col71 varchar(225),
col72 varchar(225),
col73 varchar(225),
col74 varchar(225),
col75 varchar(225),
col76 varchar(225),
col77 varchar(225),
col78 varchar(225),
col79 varchar(225),
col80 varchar(225),
col81 varchar(225),
col82 varchar(225),
col83 varchar(225),
col84 varchar(225),
col85 varchar(225),
col86 varchar(225),
col87 varchar(225),
col88 varchar(225),
col89 varchar(225),
col90 varchar(225),
col91 varchar(225),
col92 varchar(225),
col93 varchar(225),
col94 varchar(225),
col95 varchar(225),
col96 varchar(225),
col97 varchar(225),
col98 varchar(225),
col99 varchar(225),
col100 varchar(225),
col101 varchar(225),
col102 varchar(225),
col103 varchar(225),
col104 varchar(225),
col105 varchar(225),
col106 varchar(225),
col107 varchar(225),
col108 varchar(225),
col109 varchar(225),
col110 varchar(225),
col111 varchar(225),
col112 varchar(225),
col113 varchar(225),
col114 varchar(225),
col115 varchar(225),
col116 varchar(225),
col117 varchar(225),
col118 varchar(225),
col119 varchar(225),
col120 varchar(225),
col121 varchar(225),
col122 varchar(225),
col123 varchar(225),
col124 varchar(225),
col125 varchar(225),
col126 varchar(225),
col127 varchar(225),
col128 varchar(225),
col129 varchar(225),
col130 varchar(225),
col131 varchar(225),
col132 varchar(225),
col133 varchar(225),
col134 varchar(225),
col135 varchar(225),
col136 varchar(225),
col137 varchar(225),
col138 varchar(225),
col139 varchar(225),
col140 varchar(225),
col141 varchar(225),
col142 varchar(225),
col143 varchar(225),
col144 varchar(225),
col145 varchar(225),
col146 varchar(225),
col147 varchar(225),
col148 varchar(225),
col149 varchar(225),
col150 varchar(225),
col151 varchar(225),
col152 varchar(225),
col153 varchar(225),
col154 varchar(225),
col155 varchar(225),
col156 varchar(225),
col157 varchar(225),
col158 varchar(225),
col159 varchar(225),
col160 varchar(225),
col161 varchar(225),
col162 varchar(225),
col163 varchar(225),
col164 varchar(225),
col165 varchar(225),
col166 varchar(225),
col167 varchar(225),
col168 varchar(225),
col169 varchar(225),
col170 varchar(225),
col171 varchar(225),
col172 varchar(225),
col173 varchar(225),
col174 varchar(225),
col175 varchar(225),
col176 varchar(225),
col177 varchar(225),
col178 varchar(225),
col179 varchar(225),
col180 varchar(225),
col181 varchar(225),
col182 varchar(225),
col183 varchar(225),
col184 varchar(225),
col185 varchar(225),
col186 varchar(225),
col187 varchar(225),
col188 varchar(225),
col189 varchar(225),
col190 varchar(225),
col191 varchar(225),
col192 varchar(225),
col193 varchar(225),
col194 varchar(225),
col195 varchar(225),
col196 varchar(225),
col197 varchar(225),
col198 varchar(225),
col199 varchar(225),
col200 varchar(225)
);
load data url s3option {'endpoint'='http://cos.ap-guangzhou.myqcloud.com','access_key_id'='***','secret_access_key'='***','bucket'='mo-load-guangzhou-1308875761', 'filepath'='compressed_file/10000000_200_columns_load_data.csv.gz', 'compression'='gzip'} into table test.table_200_columns fields terminated by ',' lines terminated by '\n' parallel 'true';

Additional information

No response

jensenojs commented 5 months ago

不知道pprof能保留多久, 先截个图

image
jensenojs commented 3 months ago

not working on it

huby2358 commented 2 months ago

没看还

huby2358 commented 2 months ago

还没看

heni02 commented 2 months ago

更新最新的profile链接: 40000000_100_columns_load_data.flate :

企业微信截图_07c29b07-ea5f-4e4d-95cc-6e32bec5c39a

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22bmB%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-8f02aeed6-20240828%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D,%22profileTypeId%22:%22memory:alloc_objects:count:space:bytes%22%7D%5D,%22range%22:%7B%22from%22:%221724878972861%22,%22to%22:%221724881288834%22%7D%7D%7D&schemaVersion=1&orgId=1

40000000_100_columns_load_data.csv:

企业微信截图_187fe338-f6d4-4e91-ac0d-92c2a8a371bc

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22bmB%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-8f02aeed6-20240828%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D,%22profileTypeId%22:%22memory:inuse_objects:count:space:bytes%22%7D%5D,%22range%22:%7B%22from%22:%221724878386000%22,%22to%22:%221724878660000%22%7D%7D%7D&schemaVersion=1&orgId=1

heni02 commented 2 months ago

最新profile更新: 10000000_200_columns_load_data.csv.gz

企业微信截图_05af89f4-5371-497a-a052-d894175d4540

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22ARP%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-8f02aeed6-20240828%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22memory:alloc_objects:count:space:bytes%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221724881231285%22,%22to%22:%221724884183732%22%7D%7D%7D&schemaVersion=1&orgId=1

10000000_200_columns_load_data.csv

企业微信截图_9daeb891-7fcd-43f2-988a-2ace1e4f842d

https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22ARP%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-8f02aeed6-20240828%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22memory:alloc_objects:count:space:bytes%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221724878651000%22,%22to%22:%221724878786000%22%7D%7D%7D&schemaVersion=1&orgId=1

huby2358 commented 2 months ago

没看

huby2358 commented 2 months ago

没看

huby2358 commented 2 months ago

没看

huby2358 commented 1 month ago
企业微信截图_929cb983-2a3e-448e-b542-85c787206add

在本地mac1000w多行数据,压缩和非压缩耗时差距3倍左右

huby2358 commented 1 month ago
企业微信截图_a82ff44a-955a-42b6-8724-0a2abdd57a87 企业微信截图_e2809b68-2c65-4a12-a2d4-097aed66a41d

在深圳机器也是差不多3倍多点差距,1000w行,不过都是用的server端文件load的,不知道是不是s3load导致的差异?

huby2358 commented 1 month ago

暂无投入,还需要继续看

huby2358 commented 1 month ago

没投入

huby2358 commented 1 month ago

未投入

huby2358 commented 1 month ago

未投入

huby2358 commented 4 weeks ago

主要原因应该是压缩文件读的时候,是串行的,之前试了一下看压缩文件能做文件切割不,目前看起来压缩文件不好做切割。

jensenojs commented 3 weeks ago

她在休假

huby2358 commented 2 weeks ago

没进展

huby2358 commented 1 week ago

无进展

huby2358 commented 1 week ago

无进展

huby2358 commented 6 days ago

无进展

huby2358 commented 2 days ago

无进展