trinodb / tpch

Port of TPC-H dbgen to Java
44 stars 45 forks source link

Improve performance of data generation #34

Closed wendigo closed 1 year ago

wendigo commented 1 year ago

For even a simple query performance is much much better:

After:

trino> select lower(orderstatus), count(1) from tpch.sf100.orders group by lower(orderstatus);
 _col0 |  _col1
-------+----------
 p     |  3841445
 o     | 73086053
 f     | 73072502
(3 rows)

Query 20230413_162027_00025_v32u9, FINISHED, 3 nodes
Splits: 54 total, 54 done (100.00%)
9.38 [150M rows, 0B] [16M rows/s, 0B/s]

Before:

trino> select lower(orderstatus), count(1) from tpch.sf100.orders group by lower(orderstatus);
 _col0 |  _col1
-------+----------
 p     |  3841445
 o     | 73086053
 f     | 73072502
(3 rows)

Query 20230413_171901_00025_x7epn, FINISHED, 3 nodes
Splits: 54 total, 54 done (100.00%)
15.32 [150M rows, 0B] [9.79M rows/s, 0B/s]