ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
4.03k stars 587 forks source link

tpch10000 q18. WideCombiner single bucket don't fit into memory #11416

Open lll-phill-lll opened 3 weeks ago

lll-phill-lll commented 3 weeks ago

https://github.com/ydb-platform/ydb/blob/65dc86e989300e998b03fb414beccebed687d785/ydb/library/yql/minikql/comp_nodes/mkql_wide_combine.cpp#L481

Bt: https://paste.yandex-team.ru/d7a045b0-8309-4f2f-bc6d-56c514da5928

vladl2802 commented 2 weeks ago

What we tried for now (in https://github.com/ydb-platform/ydb/pull/11471):

  1. When extracting data from combiner first process in-memory buckets and only after those process spilled buckets. But this change most likely don't have any impact because all buckets will be spilled before extracting.
  2. Add extra hashing function on top of existing Hasher. Motivation: without this there are only 1/4 unique hash values on q18 (Hasher for integers returns just its value as hash and for input data %32 < 8 is true. So for ex if we want to split values in 128 buckets with hash % 128 we will get only 1/4 non-empty buckets). But with extra hash function values are distributed evenly between buckets

As a result execution time of q18 on scale 100 with 1 task (pragma ydb.MaxTasksPerStage="1") jump from average 297s to 357s. But judging by flame graphs (placed lower), additional hash function does not have any huge impact on performance (0.08% for both last combiners). So my guess is that performance drop is caused by more smaller buckets to process then before.

According to @lll-phill-lll, on 10k scale this fixes memory limit exception that was firing and query got timeouted after 1 hour of execution.

Without hashing

mem-without-hashing

With hashing

mem

vladl2802 commented 2 weeks ago

For point (2) I've tried xxh64 (those flame graphs are in previous comment and below also) and fibonacci hashing (its flame graph is below).

So for some reason (that I can't explain for now) xxh64 seems faster than fibonacci even so fibonacci should require less operations to compute. But I am not sure about that point, because execution time is really unstable.

So we will use xxh64 for now. Further progress can be made in block combine or/and after https://github.com/ydb-platform/ydb/issues/11591

Fibonacci hashing (as here)

mem-fibonacci

xxh64

mem-xxh

lll-phill-lll commented 2 weeks ago

2h run results: https://nda.ya.ru/t/qWfckDan79ezxP spilling plot: https://nda.ya.ru/t/wp-yyyx779ezvp

Looks suspicious. Like it worked only for 30 mins

lll-phill-lll commented 2 weeks ago

https://nda.ya.ru/t/dlVeCoho79f4uA ![Uploading a08810f4946fbb1c.svg…]()

Error after 32 mins