ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.83k stars 532 forks source link

Poor perf in olap ya cloud table #8444

Open naspirato opened 3 weeks ago

naspirato commented 3 weeks ago

Issue with In() filter This is our QA database in the cloud with an OLAP table, version stable-24-2

https://monitoring.ydb.yandex-team.ru/tenant?tenantPage=query&diagnosticsTab=nodes&selectedConsumer=&name=%2Fru-central1%2Fb1ggceeul2pkher8vhb6%2Fetnvsjbk7kh1jc6bbfi8&backend=https%3A%2F%2Foidc-proxy.ydb.cloud.yandex-team.ru%2Fydbproxy-public.ydb.cloud.yandex.net%3A8765&clusterName=cloud_prod_kikimr_ydb_public_storage&clckid=04bfe3f0

Query 1

select *
from test_results/test_runs
where full_name = 'ydb/core/kqp/ut/tx/KqpSinkLocks.EmptyRangeAlreadyBrokenOlap'

takes < 1 sec AST

Query 2

select *
from test_results/test_runs
where full_name in ( 'ydb/core/kqp/ut/tx/KqpSinkLocks.EmptyRangeAlreadyBrokenOlap' )

takes ~10 sec AST

This is critical because all default datalens scenarios use IN() filtering full_name - PrimaryKey


UPD, version ydb-stable-24-3-7-hotfix-7

select *
from test_results/test_runs
where full_name in ( 'ydb/core/kqp/ut/tx/KqpSinkLocks.EmptyRangeAlreadyBrokenOlap' )

AST Takes ~13 seconds ( + 30 % vs stable 24-2)

spuchin commented 3 days ago

@nikvas0 Nikita, any idea why we don't use block filter there? Even in 24-3, AST shows (Filter (NarrowMap (WideFromBlocks $11) ...).

nikvas0 commented 2 days ago

@spuchin @naspirato

According to logs, it uses block computations. I guess the problem here is that ydb shows not fully optimized plan (before peephole).

Similar query I tested:

select *
                from `/Root/ColumnShard`
                WHERE Col2 IN ('test')

[0,0] AST: (
(return (lambda '() (block '(
  (let $1 (KqpTable '"/Root/ColumnShard" '"72057594046644480:2" '"" '1))
  (let $2 (KqpBlockReadOlapTableRanges $1 (Void) '('"Col1" '"Col2") '() '() (lambda '($4) $4)))
  (let $3 (lambda '($5 $6 $7) (block '(
    (let $8 (BlockFunc '"Equals" (BlockType (OptionalType (DataType 'Bool))) $6 (AsScalar (String '"test"))))
    (return $5 $6 (BlockCoalesce $8 (AsScalar (Bool 'false))) $7)
  ))))
  (return (FromFlow (NarrowMap (WideFromBlocks (BlockCompress (WideMap $2 $3) '2)) (lambda '($9 $10) (AsStruct '('"Col1" $9) '('"Col2" $10))))))
))))
)

The fast query (the first query with equality) uses local filtration on columnshards. I checked the query with IN using 24-3 and main. 24-3 can't use local computations while main can. Therefore, I suppose that some improvements for analytics haven't been merged into 24-3.