h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
322 stars 85 forks source link

ClickHouse join task #137

Closed jangorecki closed 3 years ago

jangorecki commented 4 years ago

ClickHouse does not yet have join benchmark script implemented. This issue is for adding join task for ClickHouse. The current, quite complicated process of preparing data for ClickHouse, would get even more compilcated when adding join task. Not mentioning about extra disk space that would be needed for new duplicated csv files. I filled https://github.com/ClickHouse/ClickHouse/issues/9361, would be great if we could remove that burden, as it is also more difficult in maintenance.

jangorecki commented 3 years ago

https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/#selecting-the-primary-key

You can create a table without a primary key using the ORDER BY tuple() syntax. In this case, ClickHouse stores data in the order of inserting. If you want to save data order when inserting data by INSERT ... SELECT queries, set max_insert_threads = 1.

seems to be sufficient to address problem about duplicating data just to add integer sequence for primary key

jangorecki commented 3 years ago

implemented and published