h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
321 stars 85 forks source link

clickhouse "select * join" seems to be less efficient then "select cols join" #167

Closed jangorecki closed 3 years ago

jangorecki commented 3 years ago

It looks like not specifying columns to select from join is less efficient in clickhouse.

CREATE TABLE ans ENGINE = Memory AS SELECT * FROM J1_1e7_NA_0_0 INNER JOIN J1_1e7_1e4_0_0 USING (id1);

CREATE TABLE ans
ENGINE = Memory AS
SELECT *
FROM J1_1e7_NA_0_0
INNER JOIN J1_1e7_1e4_0_0 USING (id1)

↗ Progress: 968.46 thousand rows, 62.61 MB (118.88 thousand rows/s., 7.69 MB/s.) ████▍                                          9%
Received exception from server (version 20.9.3):
Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: Memory limit (for query) exceeded: would use 100.00 GiB (attempt to allocate chunk of 16777216 bytes), maximum: 100.00 GiB. 

Same query succeed if I specify all columns in SELECT statement. This is not an issue but syntax for clickhouse will have to use a longer queries listing all columns rather than using *.

jangorecki commented 3 years ago

No a blocker so can be closed.