Open martint opened 7 months ago
It turns out this is not a bug. Here's the explanation of what's happening. When a term in an AND clause is null, the evaluator needs to look at the other expression to decide what the result should be. This is because in SQL, AND behaves in the following way:
NULL AND TRUE => NULL
NULL AND FALSE => FALSE
For the (null, 'y')
row, the filter reduces to
x = NULL AND CAST(y AS INTEGER) = 1
=>
NULL AND CAST(y AS INTEGER) = 1
=>
CAST(y AS INTEGER) = 1
I take that back. It's not a bug in the evaluation of the AND expression, but there's a bug in how the filters are being combined.
The WHERE x = 'a'
clause in the inner query evaluates to false if x = 'a'
is null, a fact that is not captured by the filter in the optimized query.
The proper transformation is a filter with the following shape:
x = 'a' AND (x = 'a') IS NOT NULL and CAST(y AS INTEGER) = 1
How to fix it:
a
and b
are combined, use the form coalesce(a, false) and b
, unless a
is guaranteed to be non-nullcoalesce(a, false)
can only be true if a
is not nullAdditionally, it may be necessary to improve optimizations for coalesce to recognize shapes such as:
coalesce(a and b, false)
is equivalent to coalesce(a, false) and coalesce(b, false)
Here's another form involving joins that fails, too:
WITH
t(x,y) AS (
VALUES
('a', '1'),
('b', 'x'),
(null, 'y')
),
u(x,y) AS (
VALUES
('a', '1')
)
SELECT *
FROM t JOIN u ON t.x = u.x
WHERE CAST(t.y AS int) = 1;
Hey @martint , I can take this and try to fix it. For your suggested fixes, do you have any place in mind that it should happen? Maybe part of the optimization or query rewriting?
I’m looking into this myself. The fix is actually pretty involved due to other unintended effects related to how it’s doing predicate pushdown for joins.
Here's a fix for the inner join case: https://github.com/trinodb/trino/pull/21429
Outer joins and nested filters are still in progress.
The following query fails:
The plan looks correct. In particular, the filter lists the terms in the natural engine evaluation order:
((field = varchar(1) 'a') AND (CAST(field_0 AS integer) = integer '1'))