Open naspirato opened 3 weeks ago
@nikvas0 Nikita, any idea why we don't use block filter there? Even in 24-3, AST shows (Filter (NarrowMap (WideFromBlocks $11) ...)
.
@spuchin @naspirato
According to logs, it uses block computations. I guess the problem here is that ydb shows not fully optimized plan (before peephole).
Similar query I tested:
select *
from `/Root/ColumnShard`
WHERE Col2 IN ('test')
[0,0] AST: (
(return (lambda '() (block '(
(let $1 (KqpTable '"/Root/ColumnShard" '"72057594046644480:2" '"" '1))
(let $2 (KqpBlockReadOlapTableRanges $1 (Void) '('"Col1" '"Col2") '() '() (lambda '($4) $4)))
(let $3 (lambda '($5 $6 $7) (block '(
(let $8 (BlockFunc '"Equals" (BlockType (OptionalType (DataType 'Bool))) $6 (AsScalar (String '"test"))))
(return $5 $6 (BlockCoalesce $8 (AsScalar (Bool 'false))) $7)
))))
(return (FromFlow (NarrowMap (WideFromBlocks (BlockCompress (WideMap $2 $3) '2)) (lambda '($9 $10) (AsStruct '('"Col1" $9) '('"Col2" $10))))))
))))
)
The fast query (the first query with equality) uses local filtration on columnshards. I checked the query with IN using 24-3 and main. 24-3 can't use local computations while main can. Therefore, I suppose that some improvements for analytics haven't been merged into 24-3.
Issue with In() filter This is our QA database in the cloud with an OLAP table, version stable-24-2
https://monitoring.ydb.yandex-team.ru/tenant?tenantPage=query&diagnosticsTab=nodes&selectedConsumer=&name=%2Fru-central1%2Fb1ggceeul2pkher8vhb6%2Fetnvsjbk7kh1jc6bbfi8&backend=https%3A%2F%2Foidc-proxy.ydb.cloud.yandex-team.ru%2Fydbproxy-public.ydb.cloud.yandex.net%3A8765&clusterName=cloud_prod_kikimr_ydb_public_storage&clckid=04bfe3f0
Query 1
takes < 1 sec AST
Query 2
takes ~10 sec AST
This is critical because all default datalens scenarios use IN() filtering full_name - PrimaryKey
UPD, version ydb-stable-24-3-7-hotfix-7
AST Takes ~13 seconds ( + 30 % vs stable 24-2)