trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.37k stars 2.98k forks source link

Incorrect merging of nested filters #21308

Open martint opened 7 months ago

martint commented 7 months ago

The following query fails:

WITH t(x,y) AS (
    VALUES
        ('a', '1'),
        ('b', 'x'),
        (null, 'y')
)
SELECT y
FROM (
  SELECT *
  FROM t
  WHERE x = 'a'
)
WHERE CAST(y AS int) = 1
Query 20240328_144814_00030_5f8wi failed: Cannot cast 'y' to INT
io.trino.spi.TrinoException: Cannot cast 'y' to INT
    at io.trino.type.VarcharOperators.castToInteger(VarcharOperators.java:122)
    at io.trino.$gen.PageFilter_20240328_144814_143.filter(Unknown Source)
    at io.trino.$gen.PageFilter_20240328_144814_143.filter(Unknown Source)
    at io.trino.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:119)
    at io.trino.operator.FilterAndProjectOperator.lambda$new$0(FilterAndProjectOperator.java:61)
    at io.trino.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:285)
    at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:359)
    at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:412)
    at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
    at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:412)
    at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
    at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:412)
    at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
    at io.trino.operator.WorkProcessorUtils$BlockingProcess.process(WorkProcessorUtils.java:207)
    at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:412)
    at io.trino.operator.WorkProcessorOperatorAdapter.getOutput(WorkProcessorOperatorAdapter.java:138)
    at io.trino.operator.Driver.processInternal(Driver.java:398)
    at io.trino.operator.Driver.lambda$process$8(Driver.java:301)
    at io.trino.operator.Driver.tryWithLock(Driver.java:704)
    at io.trino.operator.Driver.process(Driver.java:293)
    at io.trino.operator.Driver.processForDuration(Driver.java:264)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
    at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:76)
    at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
    at io.trino.$gen.Trino_testversion____20240328_144624_75.run(Unknown Source)
    at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
    at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:174)
    at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:161)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)

The plan looks correct. In particular, the filter lists the terms in the natural engine evaluation order: ((field = varchar(1) 'a') AND (CAST(field_0 AS integer) = integer '1'))

Trino version: testversion
 Fragment 0 [SINGLE]
     Output layout: [field_0]
     Output partitioning: SINGLE []
     Output[columnNames = [y]]
     │   Layout: [field_0:varchar(1)]
     │   Estimates: {rows: 0 (24B), cpu: 0, memory: 0B, network: 0B}
     │   y := field_0
     └─ FilterProject[filterPredicate = ((field = varchar(1) 'a') AND (CAST(field_0 AS integer) = integer '1'))]
        │   Layout: [field_0:varchar(1)]
        │   Estimates: {rows: 0 (24B), cpu: 280, memory: 0B, network: 0B}/{rows: 0 (24B), cpu: 24, memory: 0B, network: 0B}
        └─ LocalExchange[partitioning = ROUND_ROBIN]
           │   Layout: [field:varchar(1), field_0:varchar(1)]
           │   Estimates: {rows: 3 (280B), cpu: 280, memory: 0B, network: 0B}
           └─ Values[]
                  Layout: [field:varchar(1), field_0:varchar(1)]
                  Estimates: {rows: 3 (280B), cpu: 0, memory: 0B, network: 0B}
                  (varchar(1) 'a', varchar(1) '1')
                  (varchar(1) 'b', varchar(1) 'x')
                  (null::varchar(1), varchar(1) 'y')
martint commented 7 months ago

It turns out this is not a bug. Here's the explanation of what's happening. When a term in an AND clause is null, the evaluator needs to look at the other expression to decide what the result should be. This is because in SQL, AND behaves in the following way:

NULL AND TRUE => NULL
NULL AND FALSE => FALSE

For the (null, 'y') row, the filter reduces to

x = NULL AND CAST(y AS INTEGER) = 1

=>

NULL AND CAST(y AS INTEGER) = 1

=>

CAST(y AS INTEGER) = 1
martint commented 7 months ago

I take that back. It's not a bug in the evaluation of the AND expression, but there's a bug in how the filters are being combined.

The WHERE x = 'a' clause in the inner query evaluates to false if x = 'a' is null, a fact that is not captured by the filter in the optimized query.

martint commented 7 months ago

The proper transformation is a filter with the following shape:

x = 'a' AND (x = 'a') IS NOT NULL and CAST(y AS INTEGER) = 1
martint commented 7 months ago

How to fix it:

martint commented 7 months ago

Additionally, it may be necessary to improve optimizations for coalesce to recognize shapes such as:

martint commented 7 months ago

Here's another form involving joins that fails, too:

WITH
   t(x,y) AS (
       VALUES
              ('a', '1'),
              ('b', 'x'),
              (null, 'y')
   ),
   u(x,y) AS (
       VALUES
              ('a', '1')
   )
SELECT *
FROM t JOIN u ON t.x = u.x
WHERE CAST(t.y AS int) = 1;
Arstanley commented 6 months ago

Hey @martint , I can take this and try to fix it. For your suggested fixes, do you have any place in mind that it should happen? Maybe part of the optimization or query rewriting?

martint commented 6 months ago

I’m looking into this myself. The fix is actually pretty involved due to other unintended effects related to how it’s doing predicate pushdown for joins.

martint commented 6 months ago

Here's a fix for the inner join case: https://github.com/trinodb/trino/pull/21429

Outer joins and nested filters are still in progress.