Open Heltman opened 2 months ago
@raunaqmorarka cc
I will try construct a test cast to reproduce this problem.
Cc @martint
I have construct a test case like below:
-- create partition table with bigint partition type
create table iceberg_test(
id int,
name varchar,
part_key bigint
) with (
format = 'PARQUET',
partitioning = ARRAY ['part_key'],
format_version = 1
);
-- insert some test partition, attention: partition key must not be continue, we lose part_key=2
insert into iceberg_test
values
(1, 'Alice', 1),
(2, 'Bob', 3),
(3, 'Coco', 4),
(4, 'Coco', 5);
-- before query, we need enable optimize metadata, this will optimize to read partition key as value sets
-- and add a predicate on subquery which been translated to inner join, if set false, infinite loop don't happened.
set session optimize_metadata_queries=true;
-- query like below
SELECT
t0.*
FROM (
SELECT
cast(part_key as varchar) part
FROM iceberg_test
WHERE
part_key IN (
SELECT
part_key
FROM iceberg_test
)
) t0
where t0.part IN ('3', '4');
we will get error like below:
Query 20240828_080543_00039_test_fbfr9 failed: The optimizer exhausted the time limit of 180000 ms: Top rules: {
io.trino.sql.planner.iterative.rule.RemoveRedundantPredicateAboveTableScan: 107018 ms, 1080558 invocations, 1080558 applications,
io.trino.sql.planner.iterative.rule.ExpressionRewriteRuleSet.FilterExpressionRewrite: 52462 ms, 10805580 invocations, 0 applications,
io.trino.sql.planner.iterative.rule.PushFilterIntoValues: 2581 ms, 1080558 invocations, 0 applications,
io.trino.sql.planner.iterative.rule.RemoveTrivialFilters: 1183 ms, 1080558 invocations, 0 applications,
io.trino.sql.planner.iterative.rule.ExpressionRewriteRuleSet.ProjectExpressionRewrite: 0 ms, 30 invocations, 0 applications }
...
@raunaqmorarka @martint , problem can quick reproduce like above, call me if need more information.
Thanks for adding the repro steps
@raunaqmorarka is this still present?
I have a sql like below:
class
SimplifyContinuousInValues
will optimize this IR to:but in class
RemoveRedundantPredicateAboveTableScan
, we will cast expression to domain, so we get predicateColumnDomain like below:we compute unforcedColumnDomain from predicateColumnDomain, and then we check equals and exist.
but unforcedColumnDomain is not same woth predicateColumnDomain, it is:
we just know they are same range(x bigint, [202403 <= x <= 202404] equals [x = 202403 or x = 202404]), but TupleDomain equals method does't know.
optimize will add a new Filter node to continue repeat, this case infinite loop:
SimplifyContinuousInValues
pull in https://github.com/trinodb/trino/pull/22411We should deal with domain compare method to solve the problem.