trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.18k stars 2.93k forks source link

CTE Query Optimization Issue - Trino 389 #19115

Closed Aravindravi1209 closed 11 months ago

Aravindravi1209 commented 11 months ago

We had upgraded Trino from Version 355 to 389 on our system and we are facing an issue with a particular query:

The query below returned 0 rows.

with V_RatePlanCharge as (
    select id, servicestartdate__c from ratePlanCharge where id = '8adc94028a6e0e76018a6eed60a438be'
)

select RPC.* from ratePlanCharge RPC
inner join V_RatePlanCharge on RPC.id = V_RatePlanCharge.id
left join (
    select RPC.id, RPC.servicestartdate__c from ratePlanCharge RPC
    inner join V_RatePlanCharge on RPC.id = V_RatePlanCharge.id where RPC.id = '8adc94028a6e0e76018a6eed60a438be' 
    and RPC.servicestartdate__c = V_RatePlanCharge.servicestartdate__c 
    and date_add('day', -1, date_parse('2023-09-08','%Y-%m-%d')) > RPC.servicestartdate__c
      ) sub on  RPC.id = sub.id
where
  RPC.servicestartdate__c = V_RatePlanCharge.servicestartdate__c

However, the query below returned a row even though what we did was just to duplicate CTE and change the name, and use them in the query.

with V_RatePlanCharge2 as ( 
     select id, servicestartdate__c from ratePlanCharge where id = '8adc94028a6e0e76018a6eed60a438be'
),
V_RatePlanCharge3 as (
    select id, servicestartdate__c from ratePlanCharge where id = '8adc94028a6e0e76018a6eed60a438be'
)

select RPC.* from ratePlanCharge RPC
inner join V_RatePlanCharge2 on RPC.id = V_RatePlanCharge2.id
left join (
    select RPC.id, RPC.servicestartdate__c from ratePlanCharge RPC
    inner join V_RatePlanCharge3 on RPC.id = V_RatePlanCharge3.id  where RPC.id = '8adc94028a6e0e76018a6eed60a438be'
    and RPC.servicestartdate__c = V_RatePlanCharge3.servicestartdate__c
    and date_add('day', -1, date_parse('2023-09-08','%Y-%m-%d')) > RPC.servicestartdate__c
      ) sub on  RPC.id = sub.id
where
  RPC.servicestartdate__c = V_RatePlanCharge2.servicestartdate__c

When running the explain statement, we could see there is only a change in this particular fragment, where the number of input and output rows are 1 in the success case but 0 in the failure case!!

image

We are expecting the first query as well to return a record, but there seems to be some query optimization changes between 355 to 389 which is leading to wrong results!

hashhar commented 11 months ago

Please share the complete explain plans for both queries on both versions.

It'll also be helpful if you can provide a reproduction which uses one of the tpch or tpcds tables since we can run the reproduction ourselves and also use it as a regression test.

cc: @martint @kasiafi

Aravindravi1209 commented 11 months ago

Here are the steps I followed to reproduce this issue:

To create a table named Account with 2 fields which we are interested in:

-- Create the table
CREATE TABLE account (
    invoicetemplateid varchar,
    lastinvoicedate date
);

-- Insert data into the table
INSERT INTO account (invoicetemplateid, lastinvoicedate)
VALUES
    ('2c9081a03c63c94c013c66688a2c00bf', DATE '2013-01-23');

Original query which worked in Trino 355 but not in 389:

with V_Account as (
    select invoicetemplateid, lastinvoicedate from account where invoicetemplateid = '2c9081a03c638994013c63978baf002b'
)
select A.invoicetemplateid, A.lastinvoicedate from account A
inner join V_Account on A.invoicetemplateid = V_Account.invoicetemplateid
left join (
    select A.invoicetemplateid, A.lastinvoicedate from account A
    inner join V_Account on A.invoicetemplateid = V_Account.invoicetemplateid
    where A.invoicetemplateid = '2c9081a03c638994013c63978baf002b'
    and A.lastinvoicedate = V_Account.lastinvoicedate
    and date_add('day', -1, date_parse('2013-01-23', '%Y-%m-%d')) > A.lastinvoicedate
) sub on A.invoicetemplateid = sub.invoicetemplateid
where A.lastinvoicedate = V_Account.lastinvoicedate;

Output from Trino 355:

image

Explain Statement from Trino 355:

Fragment 1 [HASH]
CPU: 201.43ms, Scheduled: 1.80s, Input: 1 row (51B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (42B)
Output layout: [invoicetemplateid, lastinvoicedate]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
LeftJoin[("invoicetemplateid" = "invoicetemplateid_58")][$hashvalue, $hashvalue_177]
│   Layout: [invoicetemplateid:varchar, lastinvoicedate:date]
│   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
│   CPU: 94.00ms (5.35%), Scheduled: 866.00ms (3.19%), Output: 1 row (42B)
│   Left (probe) Input avg.: 0.06 rows, Input std.dev.: 387.30%
│   Right (build) Input avg.: 0.00 rows, Input std.dev.: ?%
│   Collisions avg.: ? (?% est.), Collisions std.dev.: ?%
│   Distribution: PARTITIONED
├─ RemoteSource[2]
│      Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue:bigint]
│      CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 1 row (51B)
│      Input avg.: 0.06 rows, Input std.dev.: 387.30%
└─ LocalExchange[HASH][$hashvalue_177] ("invoicetemplateid_58")
    │   Layout: [invoicetemplateid_58:varchar, $hashvalue_177:bigint]
    │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
    │   CPU: 2.00ms (0.11%), Scheduled: 2.00ms (0.01%), Output: 0 rows (0B)
    │   Input avg.: 0.00 rows, Input std.dev.: ?%
    └─ RemoteSource[5]
           Layout: [invoicetemplateid_58:varchar, $hashvalue_178:bigint]
           CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 0 rows (0B)
           Input avg.: 0.00 rows, Input std.dev.: ?%

Fragment 2 [HASH]
CPU: 334.72ms, Scheduled: 3.15s, Input: 2 rows (65B); per task: avg.: 2.00 std.dev.: 0.00, Output: 1 row (51B)
Output layout: [invoicetemplateid, lastinvoicedate, $hashvalue_176]
Output partitioning: HASH [invoicetemplateid][$hashvalue_176]
Stage Execution Strategy: UNGROUPED_EXECUTION
Project[]
│   Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_176:bigint]
│   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
│   CPU: 51.00ms (2.90%), Scheduled: 514.00ms (1.89%), Output: 1 row (51B)
│   Input avg.: 0.06 rows, Input std.dev.: 387.30%
│   $hashvalue_176 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("invoicetemplateid"), 0))
└─ InnerJoin[("lastinvoicedate" = "lastinvoicedate_47")][$hashvalue_171, $hashvalue_173]
    │   Layout: [invoicetemplateid:varchar, lastinvoicedate:date]
    │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
    │   CPU: 196.00ms (11.15%), Scheduled: 1.94s (7.14%), Output: 1 row (42B)
    │   Left (probe) Input avg.: 0.06 rows, Input std.dev.: 387.30%
    │   Right (build) Input avg.: 0.06 rows, Input std.dev.: 387.30%
    │   Collisions avg.: 0.00 (0.00% est.), Collisions std.dev.: ?%
    │   Distribution: PARTITIONED
    │   dynamicFilterAssignments = {lastinvoicedate_47 -> #df_1160}
    ├─ RemoteSource[3]
    │      Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_171:bigint]
    │      CPU: 1.00ms (0.06%), Scheduled: 2.00ms (0.01%), Output: 1 row (51B)
    │      Input avg.: 0.06 rows, Input std.dev.: 387.30%
    └─ LocalExchange[HASH][$hashvalue_173] ("lastinvoicedate_47")
       │   Layout: [lastinvoicedate_47:date, $hashvalue_173:bigint]
       │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
       │   CPU: 15.00ms (0.85%), Scheduled: 103.00ms (0.38%), Output: 1 row (14B)
       │   Input avg.: 0.06 rows, Input std.dev.: 387.30%
       └─ RemoteSource[4]
              Layout: [lastinvoicedate_47:date, $hashvalue_174:bigint]
              CPU: 1.00ms (0.06%), Scheduled: 7.00ms (0.03%), Output: 1 row (14B)
              Input avg.: 0.06 rows, Input std.dev.: 387.30%

Fragment 3 [SOURCE]
CPU: 344.72ms, Scheduled: 5.47s, Input: 1 row (42B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (51B)
Output layout: [invoicetemplateid, lastinvoicedate, $hashvalue_172]
Output partitioning: HASH [lastinvoicedate][$hashvalue_172]
Stage Execution Strategy: UNGROUPED_EXECUTION
ScanFilterProject[table = owl:data.account, grouped = false, filterPredicate = ("invoicetemplateid" = CAST('2c9081a03c638994013c63978baf002b' AS varchar)), dynamicFilter = {"lastinvoicedate" = #df_1160}]
    Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_172:bigint]
    Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
    CPU: 343.00ms (19.51%), Scheduled: 6.00s (22.07%), Output: 1 row (51B)
    Input avg.: 1.00 rows, Input std.dev.: 0.00%
    $hashvalue_172 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate"), 0))
    invoicetemplateid := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
    lastinvoicedate := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
    Input: 1 row (42B), Filtered: 0.00%

Fragment 4 [SOURCE]
CPU: 312.66ms, Scheduled: 4.66s, Input: 1 row (42B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (14B)
Output layout: [lastinvoicedate_47, $hashvalue_175]
Output partitioning: HASH [lastinvoicedate_47][$hashvalue_175]
Stage Execution Strategy: UNGROUPED_EXECUTION
ScanFilterProject[table = owl:data.account, grouped = false, filterPredicate = ("invoicetemplateid_1" = CAST('2c9081a03c638994013c63978baf002b' AS varchar))]
    Layout: [lastinvoicedate_47:date, $hashvalue_175:bigint]
    Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
    CPU: 311.00ms (17.69%), Scheduled: 5.18s (19.06%), Output: 1 row (14B)
    Input avg.: 1.00 rows, Input std.dev.: 0.00%
    $hashvalue_175 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_47"), 0))
    lastinvoicedate_47 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
    invoicetemplateid_1 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
    Input: 1 row (42B), Filtered: 0.00%

Fragment 5 [HASH]
CPU: 384.05ms, Scheduled: 3.79s, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
Output layout: [invoicetemplateid_58, $hashvalue_184]
Output partitioning: HASH [invoicetemplateid_58][$hashvalue_184]
Stage Execution Strategy: UNGROUPED_EXECUTION
Project[]
│   Layout: [invoicetemplateid_58:varchar, $hashvalue_184:bigint]
│   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
│   CPU: 34.00ms (1.93%), Scheduled: 304.00ms (1.12%), Output: 0 rows (0B)
│   Input avg.: 0.00 rows, Input std.dev.: ?%
│   $hashvalue_184 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("invoicetemplateid_58"), 0))
└─ InnerJoin[("lastinvoicedate_104" = "lastinvoicedate_161")][$hashvalue_179, $hashvalue_181]
    │   Layout: [invoicetemplateid_58:varchar]
    │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
    │   CPU: 202.00ms (11.49%), Scheduled: 2.18s (8.01%), Output: 0 rows (0B)
    │   Left (probe) Input avg.: 0.00 rows, Input std.dev.: ?%
    │   Right (build) Input avg.: 0.00 rows, Input std.dev.: ?%
    │   Collisions avg.: ? (?% est.), Collisions std.dev.: ?%
    │   Distribution: PARTITIONED
    │   dynamicFilterAssignments = {lastinvoicedate_161 -> #df_1166}
    ├─ RemoteSource[6]
    │      Layout: [invoicetemplateid_58:varchar, lastinvoicedate_104:date, $hashvalue_179:bigint]
    │      CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 0 rows (0B)
    │      Input avg.: 0.00 rows, Input std.dev.: ?%
    └─ LocalExchange[HASH][$hashvalue_181] ("lastinvoicedate_161")
       │   Layout: [lastinvoicedate_161:date, $hashvalue_181:bigint]
       │   Estimates: {rows: ? (?), cpu: ?, memory: 0B
       │   CPU: 26.00ms (1.48%), Scheduled: 123.00ms (0.45%), Output: 0 rows (0B)
       │   Input avg.: 0.00 rows, Input std.dev.: ?%
       └─ RemoteSource[7]
           Layout: [lastinvoicedate_161:date, $hashvalue_182:bigint]
           CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 0 rows (0B)
           Input avg.: 0.00 rows, Input std.dev.: ?%

Fragment 6 [SOURCE]
CPU: 263.76ms, Scheduled: 5.13s, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
Output layout: [invoicetemplateid_58, lastinvoicedate_104, $hashvalue_180]
Output partitioning: HASH [lastinvoicedate_104][$hashvalue_180]
Stage Execution Strategy: UNGROUPED_EXECUTION
ScanFilterProject[table = owl:data.account, grouped = false, filterPredicate = (("invoicetemplateid_58" = CAST('2c9081a03c638994013c63978baf002b' AS varchar)) AND ("lastinvoicedate_104" < DATE '2013-01-22')), dynamicFilter = {"lastinvoic>
    Layout: [invoicetemplateid_58:varchar, lastinvoicedate_104:date, $hashvalue_180:bigint]
    Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
    CPU: 245.00ms (13.94%), Scheduled: 4.96s (18.23%), Output: 0 rows (0B)
    Input avg.: 0.00 rows, Input std.dev.: ?%
    $hashvalue_180 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_104"), 0))
    lastinvoicedate_104 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
    invoicetemplateid_58 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
    Input: 0 rows (0B), Filtered: ?%

Fragment 7 [SOURCE]
CPU: 247.41ms, Scheduled: 5.16s, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
Output layout: [lastinvoicedate_161, $hashvalue_183]
Output partitioning: HASH [lastinvoicedate_161][$hashvalue_183]
Stage Execution Strategy: UNGROUPED_EXECUTION
ScanFilterProject[table = owl:data.account, grouped = false, filterPredicate = (("invoicetemplateid_115" = CAST('2c9081a03c638994013c63978baf002b' AS varchar)) AND ("lastinvoicedate_161" < DATE '2013-01-22'))]
    Layout: [lastinvoicedate_161:date, $hashvalue_183:bigint]
    Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
    CPU: 237.00ms (13.48%), Scheduled: 5.01s (18.42%), Output: 0 rows (0B)
    Input avg.: 0.00 rows, Input std.dev.: ?%
    $hashvalue_183 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_161"), 0))
    invoicetemplateid_115 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
    lastinvoicedate_161 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
    Input: 0 rows (0B), Filtered: ?%

Output from Trino 389:

image

Explain Statement from Trino 389:

 Fragment 1 [HASH]
     CPU: 69.35ms, Scheduled: 1.16s, Blocked 27.50s (Input: 17.75s, Output: 0.00ns), Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [invoicetemplateid, lastinvoicedate]
     Output partitioning: SINGLE []
     LeftJoin[criteria = ("invoicetemplateid" = "invoicetemplateid_58"), hash = [$hashvalue, $hashvalue_177], distribution = PARTITIONED]
     │   Layout: [invoicetemplateid:varchar, lastinvoicedate:date]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     │   CPU: 47.00ms (3.69%), Scheduled: 805.00ms (4.56%), Blocked: 515.00ms (0.66%), Output: 0 rows (0B)
     │   Left (probe) Input avg.: 0.00 rows, Input std.dev.: ?%
     │   Right (build) Input avg.: 0.00 rows, Input std.dev.: ?%
     │   Collisions avg.: ? (?% est.), Collisions std.dev.: ?%
     │   Distribution: PARTITIONED
     ├─ RemoteSource[sourceFragmentIds = [2]]
     │      Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue:bigint]
     │      Estimates:
     │      CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 8.70s (11.18%), Output: 0 rows (0B)
     │      Input avg.: 0.00 rows, Input std.dev.: ?%
     └─ LocalExchange[partitioning = SINGLE]
        │   Layout: [invoicetemplateid_58:varchar, $hashvalue_177:bigint]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   CPU: 1.00ms (0.08%), Scheduled: 25.00ms (0.14%), Blocked: 9.21s (11.83%), Output: 0 rows (0B)
        │   Input avg.: 0.00 rows, Input std.dev.: ?%
        └─ RemoteSource[sourceFragmentIds = [5]]
               Layout: [invoicetemplateid_58:varchar, $hashvalue_178:bigint]
               Estimates:
               CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 9.05s (11.63%), Output: 0 rows (0B)
               Input avg.: 0.00 rows, Input std.dev.: ?%

 Fragment 2 [HASH]
     CPU: 71.01ms, Scheduled: 1.01s, Blocked 11.11s (Input: 6.13s, Output: 0.00ns), Input: 1 row (51B); per task: avg.: 1.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [invoicetemplateid, lastinvoicedate, $hashvalue_176]
     Output partitioning: HASH [invoicetemplateid][$hashvalue_176]
     Project[]
     │   Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_176:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     │   CPU: 1.00ms (0.08%), Scheduled: 16.00ms (0.09%), Blocked: 0.00ns (0.00%), Output: 0 rows (0B)
     │   Input avg.: 0.00 rows, Input std.dev.: ?%
     │   $hashvalue_176 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("invoicetemplateid"), 0))
     └─ InnerJoin[criteria = ("lastinvoicedate" = "lastinvoicedate_47"), hash = [$hashvalue_171, $hashvalue_173], distribution = PARTITIONED]
        │   Layout: [invoicetemplateid:varchar, lastinvoicedate:date]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   CPU: 32.00ms (2.51%), Scheduled: 420.00ms (2.38%), Blocked: 3.67s (4.72%), Output: 0 rows (0B)
        │   Left (probe) Input avg.: 1.00 rows, Input std.dev.: 0.00%
        │   Right (build) Input avg.: 0.00 rows, Input std.dev.: ?%
        │   Collisions avg.: ? (?% est.), Collisions std.dev.: ?%
        │   Distribution: PARTITIONED
        │   dynamicFilterAssignments = {lastinvoicedate_47 -> #df_1138}
        ├─ RemoteSource[sourceFragmentIds = [3]]
        │      Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_171:bigint]
        │      Estimates:
        │      CPU: 2.00ms (0.16%), Scheduled: 3.00ms (0.02%), Blocked: 4.83s (6.21%), Output: 1 row (51B)
        │      Input avg.: 1.00 rows, Input std.dev.: 0.00%
        └─ LocalExchange[partitioning = SINGLE]
           │   Layout: [lastinvoicedate_47:date, $hashvalue_173:bigint]
           │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
           │   CPU: 2.00ms (0.16%), Scheduled: 41.00ms (0.23%), Blocked: 1.31s (1.68%), Output: 0 rows (0B)
           │   Input avg.: 0.00 rows, Input std.dev.: ?%
           └─ RemoteSource[sourceFragmentIds = [4]]
                  Layout: [lastinvoicedate_47:date, $hashvalue_174:bigint]
                  Estimates:
                  CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 1.30s (1.67%), Output: 0 rows (0B)
                  Input avg.: 0.00 rows, Input std.dev.: ?%

 Fragment 3 [SOURCE]
     CPU: 369.96ms, Scheduled: 4.43s, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 1 row (42B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (51B)
     Output layout: [invoicetemplateid, lastinvoicedate, $hashvalue_172]
     Output partitioning: HASH [lastinvoicedate][$hashvalue_172]
     ScanFilterProject[table = owl:data.account, filterPredicate = ("invoicetemplateid" = VARCHAR '2c9081a03c638994013c63978baf002b'), dynamicFilters = {"lastinvoicedate" = #df_1138}]
         Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_172:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
         CPU: 362.00ms (28.39%), Scheduled: 4.33s (24.49%), Blocked: 0.00ns (0.00%), Output: 1 row (51B)
         Input avg.: 1.00 rows, Input std.dev.: 0.00%
         $hashvalue_172 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate"), 0))
         invoicetemplateid := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
         lastinvoicedate := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
         Input: 1 row (42B), Filtered: 0.00%
         Dynamic filters:
             - df_1138, NONE, collection time=21.53s

 Fragment 4 [SOURCE]
     CPU: 311.89ms, Scheduled: 4.46s, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [lastinvoicedate_47, $hashvalue_175]
     Output partitioning: HASH [lastinvoicedate_47][$hashvalue_175]
     ScanFilterProject[table = owl:data.account, filterPredicate = ("invoicetemplateid_1" = VARCHAR '2c9081a03c638994013c63978baf002b')]
         Layout: [lastinvoicedate_47:date, $hashvalue_175:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
         CPU: 295.00ms (23.14%), Scheduled: 4.34s (24.57%), Blocked: 0.00ns (0.00%), Output: 0 rows (0B)
         Input avg.: 0.00 rows, Input std.dev.: ?%
         $hashvalue_175 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_47"), 0))
         lastinvoicedate_47 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
         invoicetemplateid_1 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
         Input: 0 rows (0B), Filtered: ?%

 Fragment 5 [HASH]
     CPU: 123.52ms, Scheduled: 1.31s, Blocked 39.23s (Input: 22.13s, Output: 0.00ns), Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [invoicetemplateid_58, $hashvalue_184]
     Output partitioning: HASH [invoicetemplateid_58][$hashvalue_184]
     Project[]
     │   Layout: [invoicetemplateid_58:varchar, $hashvalue_184:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     │   CPU: 6.00ms (0.47%), Scheduled: 111.00ms (0.63%), Blocked: 0.00ns (0.00%), Output: 0 rows (0B)
     │   Input avg.: 0.00 rows, Input std.dev.: ?%
     │   $hashvalue_184 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("invoicetemplateid_58"), 0))
     └─ InnerJoin[criteria = ("lastinvoicedate_104" = "lastinvoicedate_161"), hash = [$hashvalue_179, $hashvalue_181], distribution = PARTITIONED]
        │   Layout: [invoicetemplateid_58:varchar]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   CPU: 93.00ms (7.29%), Scheduled: 1.01s (5.70%), Blocked: 11.60s (14.91%), Output: 0 rows (0B)
        │   Left (probe) Input avg.: 0.00 rows, Input std.dev.: ?%
        │   Right (build) Input avg.: 0.00 rows, Input std.dev.: ?%
        │   Collisions avg.: ? (?% est.), Collisions std.dev.: ?%
        │   Distribution: PARTITIONED
        │   dynamicFilterAssignments = {lastinvoicedate_161 -> #df_1144}
        ├─ RemoteSource[sourceFragmentIds = [6]]
        │      Layout: [invoicetemplateid_58:varchar, lastinvoicedate_104:date, $hashvalue_179:bigint]
        │      Estimates:
        │      CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 16.77s (21.55%), Output: 0 rows (0B)
        │      Input avg.: 0.00 rows, Input std.dev.: ?%
        └─ LocalExchange[partitioning = SINGLE]
           │   Layout: [lastinvoicedate_161:date, $hashvalue_181:bigint]
           │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
           │   CPU: 4.00ms (0.31%), Scheduled: 47.00ms (0.27%), Blocked: 5.51s (7.08%), Output: 0 rows (0B)
           │   Input avg.: 0.00 rows, Input std.dev.: ?%
           └─ RemoteSource[sourceFragmentIds = [7]]
                  Layout: [lastinvoicedate_161:date, $hashvalue_182:bigint]
                  Estimates:
                  CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 5.36s (6.89%), Output: 0 rows (0B)
                  Input avg.: 0.00 rows, Input std.dev.: ?%

 Fragment 6 [SOURCE]
     CPU: 238.28ms, Scheduled: 2.55s, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [invoicetemplateid_58, lastinvoicedate_104, $hashvalue_180]
     Output partitioning: HASH [lastinvoicedate_104][$hashvalue_180]
     ScanFilterProject[table = owl:data.account, filterPredicate = (("invoicetemplateid_58" = VARCHAR '2c9081a03c638994013c63978baf002b') AND ("lastinvoicedate_104" < DATE '2013-01-22')), dynamicFilters = {"lastinvoicedate_104" = #df_1144}]
         Layout: [invoicetemplateid_58:varchar, lastinvoicedate_104:date, $hashvalue_180:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
         CPU: 168.00ms (13.18%), Scheduled: 2.23s (12.63%), Blocked: 0.00ns (0.00%), Output: 0 rows (0B)
         Input avg.: 0.00 rows, Input std.dev.: ?%
         $hashvalue_180 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_104"), 0))
         lastinvoicedate_104 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
         invoicetemplateid_58 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
         Input: 0 rows (0B), Filtered: ?%
         Dynamic filters:
             - df_1144, NONE, collection time=12.69s

 Fragment 7 [SOURCE]
     CPU: 282.00ms, Scheduled: 4.47s, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [lastinvoicedate_161, $hashvalue_183]
     Output partitioning: HASH [lastinvoicedate_161][$hashvalue_183]
     ScanFilterProject[table = owl:data.account, filterPredicate = (("invoicetemplateid_115" = VARCHAR '2c9081a03c638994013c63978baf002b') AND ("lastinvoicedate_161" < DATE '2013-01-22'))]
         Layout: [lastinvoicedate_161:date, $hashvalue_183:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
         CPU: 262.00ms (20.55%), Scheduled: 4.29s (24.29%), Blocked: 0.00ns (0.00%), Output: 0 rows (0B)
         Input avg.: 0.00 rows, Input std.dev.: ?%
         $hashvalue_183 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_161"), 0))
         invoicetemplateid_115 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
         lastinvoicedate_161 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
         Input: 0 rows (0B), Filtered: ?%

In 389, when we use duplicated CTE's instead of the same CTE in other parts of the main query/subquery we are getting the results as expected:

with V_Account2 as (
    select invoicetemplateid, lastinvoicedate from account where invoicetemplateid = '2c9081a03c638994013c63978baf002b'
),
V_Account3 as (
    select invoicetemplateid, lastinvoicedate from account where invoicetemplateid = '2c9081a03c638994013c63978baf002b'
)
select A.invoicetemplateid, A.lastinvoicedate from account A
inner join V_Account2 on A.invoicetemplateid = V_Account2.invoicetemplateid
left join (
    select A.invoicetemplateid, A.lastinvoicedate from account A
    inner join V_Account3 on A.invoicetemplateid = V_Account3.invoicetemplateid
    where A.invoicetemplateid = '2c9081a03c638994013c63978baf002b'
    and A.lastinvoicedate = V_Account3.lastinvoicedate
    and date_add('day', -1, date_parse('2013-01-23', '%Y-%m-%d')) > A.lastinvoicedate
) sub on A.invoicetemplateid = sub.invoicetemplateid
where A.lastinvoicedate = V_Account2.lastinvoicedate;

Output From Trino 389:

image

Explain Statement from Trino 389 for the modified query with duplicated CTE's:

 Fragment 1 [HASH]
     CPU: 91.01ms, Scheduled: 561.04ms, Blocked 25.95s (Input: 17.16s, Output: 0.00ns), Input: 1 row (51B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (42B)
     Output layout: [invoicetemplateid, lastinvoicedate]
     Output partitioning: SINGLE []
     LeftJoin[criteria = ("invoicetemplateid" = "invoicetemplateid_58"), hash = [$hashvalue, $hashvalue_177], distribution = PARTITIONED]
     │   Layout: [invoicetemplateid:varchar, lastinvoicedate:date]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     │   CPU: 49.00ms (3.89%), Scheduled: 278.00ms (3.22%), Blocked: 203.00ms (0.32%), Output: 1 row (42B)
     │   Left (probe) Input avg.: 1.00 rows, Input std.dev.: 0.00%
     │   Right (build) Input avg.: 0.00 rows, Input std.dev.: ?%
     │   Collisions avg.: ? (?% est.), Collisions std.dev.: ?%
     │   Distribution: PARTITIONED
     ├─ RemoteSource[sourceFragmentIds = [2]]
     │      Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue:bigint]
     │      Estimates:
     │      CPU: 1.00ms (0.08%), Scheduled: 1.00ms (0.01%), Blocked: 8.61s (13.38%), Output: 1 row (51B)
     │      Input avg.: 1.00 rows, Input std.dev.: 0.00%
     └─ LocalExchange[partitioning = SINGLE]
        │   Layout: [invoicetemplateid_58:varchar, $hashvalue_177:bigint]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 8.61s (13.38%), Output: 0 rows (0B)
        │   Input avg.: 0.00 rows, Input std.dev.: ?%
        └─ RemoteSource[sourceFragmentIds = [5]]
               Layout: [invoicetemplateid_58:varchar, $hashvalue_178:bigint]
               Estimates:
               CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 8.54s (13.27%), Output: 0 rows (0B)
               Input avg.: 0.00 rows, Input std.dev.: ?%

 Fragment 2 [HASH]
     CPU: 129.72ms, Scheduled: 569.41ms, Blocked 20.02s (Input: 12.85s, Output: 0.00ns), Input: 2 rows (65B); per task: avg.: 2.00 std.dev.: 0.00, Output: 1 row (51B)
     Output layout: [invoicetemplateid, lastinvoicedate, $hashvalue_176]
     Output partitioning: HASH [invoicetemplateid][$hashvalue_176]
     Project[]
     │   Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_176:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     │   CPU: 25.00ms (1.98%), Scheduled: 104.00ms (1.21%), Blocked: 0.00ns (0.00%), Output: 1 row (51B)
     │   Input avg.: 1.00 rows, Input std.dev.: 0.00%
     │   $hashvalue_176 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("invoicetemplateid"), 0))
     └─ InnerJoin[criteria = ("lastinvoicedate" = "lastinvoicedate_47"), hash = [$hashvalue_171, $hashvalue_173], distribution = PARTITIONED]
        │   Layout: [invoicetemplateid:varchar, lastinvoicedate:date]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   CPU: 72.00ms (5.71%), Scheduled: 301.00ms (3.49%), Blocked: 243.00ms (0.38%), Output: 1 row (42B)
        │   Left (probe) Input avg.: 1.00 rows, Input std.dev.: 0.00%
        │   Right (build) Input avg.: 1.00 rows, Input std.dev.: 0.00%
        │   Collisions avg.: 0.00 (0.00% est.), Collisions std.dev.: ?%
        │   Distribution: PARTITIONED
        │   dynamicFilterAssignments = {lastinvoicedate_47 -> #df_1138}
        ├─ RemoteSource[sourceFragmentIds = [3]]
        │      Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_171:bigint]
        │      Estimates:
        │      CPU: 1.00ms (0.08%), Scheduled: 5.00ms (0.06%), Blocked: 6.02s (9.35%), Output: 1 row (51B)
        │      Input avg.: 1.00 rows, Input std.dev.: 0.00%
        └─ LocalExchange[partitioning = SINGLE]
           │   Layout: [lastinvoicedate_47:date, $hashvalue_173:bigint]
           │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
           │   CPU: 5.00ms (0.40%), Scheduled: 24.00ms (0.28%), Blocked: 7.06s (10.97%), Output: 1 row (14B)
           │   Input avg.: 1.00 rows, Input std.dev.: 0.00%
           └─ RemoteSource[sourceFragmentIds = [4]]
                  Layout: [lastinvoicedate_47:date, $hashvalue_174:bigint]
                  Estimates:
                  CPU: 2.00ms (0.16%), Scheduled: 13.00ms (0.15%), Blocked: 6.83s (10.61%), Output: 1 row (14B)
                  Input avg.: 1.00 rows, Input std.dev.: 0.00%

 Fragment 3 [SOURCE]
     CPU: 86.96ms, Scheduled: 838.12ms, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 1 row (42B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (51B)
     Output layout: [invoicetemplateid, lastinvoicedate, $hashvalue_172]
     Output partitioning: HASH [lastinvoicedate][$hashvalue_172]
     ScanFilterProject[table = owl:data.account, filterPredicate = ("invoicetemplateid" = VARCHAR '2c9081a03c638994013c63978baf002b'), dynamicFilters = {"lastinvoicedate" = #df_1138}]
         Layout: [invoicetemplateid:varchar, lastinvoicedate:date, $hashvalue_172:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
         CPU: 84.00ms (6.67%), Scheduled: 834.00ms (9.67%), Blocked: 0.00ns (0.00%), Output: 1 row (51B)
         Input avg.: 1.00 rows, Input std.dev.: 0.00%
         $hashvalue_172 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate"), 0))
         invoicetemplateid := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
         lastinvoicedate := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
         Input: 1 row (42B), Filtered: 0.00%
         Dynamic filters:
             - df_1138, [ SortedRangeSet[type=date, ranges=1, {[2013-01-23]}] ], collection time=13.77s

 Fragment 4 [SOURCE]
     CPU: 206.12ms, Scheduled: 1.21s, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 1 row (42B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (14B)
     Output layout: [lastinvoicedate_47, $hashvalue_175]
     Output partitioning: HASH [lastinvoicedate_47][$hashvalue_175]
     ScanFilterProject[table = owl:data.account, filterPredicate = ("invoicetemplateid_1" = VARCHAR '2c9081a03c638994013c63978baf002b')]
         Layout: [lastinvoicedate_47:date, $hashvalue_175:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
         CPU: 204.00ms (16.19%), Scheduled: 1.21s (14.04%), Blocked: 0.00ns (0.00%), Output: 1 row (14B)
         Input avg.: 1.00 rows, Input std.dev.: 0.00%
         $hashvalue_175 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_47"), 0))
         lastinvoicedate_47 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
         invoicetemplateid_1 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
         Input: 1 row (42B), Filtered: 0.00%

 Fragment 5 [HASH]
     CPU: 81.18ms, Scheduled: 489.04ms, Blocked 18.25s (Input: 11.56s, Output: 0.00ns), Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [invoicetemplateid_58, $hashvalue_184]
     Output partitioning: HASH [invoicetemplateid_58][$hashvalue_184]
     Project[]
     │   Layout: [invoicetemplateid_58:varchar, $hashvalue_184:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     │   CPU: 9.00ms (0.71%), Scheduled: 43.00ms (0.50%), Blocked: 0.00ns (0.00%), Output: 0 rows (0B)
     │   Input avg.: 0.00 rows, Input std.dev.: ?%
     │   $hashvalue_184 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("invoicetemplateid_58"), 0))
     └─ InnerJoin[criteria = ("lastinvoicedate_104" = "lastinvoicedate_161"), hash = [$hashvalue_179, $hashvalue_181], distribution = PARTITIONED]
        │   Layout: [invoicetemplateid_58:varchar]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   CPU: 45.00ms (3.57%), Scheduled: 245.00ms (2.84%), Blocked: 1.12s (1.74%), Output: 0 rows (0B)
        │   Left (probe) Input avg.: 0.00 rows, Input std.dev.: ?%
        │   Right (build) Input avg.: 0.00 rows, Input std.dev.: ?%
        │   Collisions avg.: ? (?% est.), Collisions std.dev.: ?%
        │   Distribution: PARTITIONED
        │   dynamicFilterAssignments = {lastinvoicedate_161 -> #df_1144}
        ├─ RemoteSource[sourceFragmentIds = [6]]
        │      Layout: [invoicetemplateid_58:varchar, lastinvoicedate_104:date, $hashvalue_179:bigint]
        │      Estimates:
        │      CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 6.04s (9.38%), Output: 0 rows (0B)
        │      Input avg.: 0.00 rows, Input std.dev.: ?%
        └─ LocalExchange[partitioning = SINGLE]
           │   Layout: [lastinvoicedate_161:date, $hashvalue_181:bigint]
           │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
           │   CPU: 3.00ms (0.24%), Scheduled: 24.00ms (0.28%), Blocked: 5.57s (8.65%), Output: 0 rows (0B)
           │   Input avg.: 0.00 rows, Input std.dev.: ?%
           └─ RemoteSource[sourceFragmentIds = [7]]
                  Layout: [lastinvoicedate_161:date, $hashvalue_182:bigint]
                  Estimates:
                  CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Blocked: 5.52s (8.58%), Output: 0 rows (0B)
                  Input avg.: 0.00 rows, Input std.dev.: ?%

 Fragment 6 [SOURCE]
     CPU: 89.19ms, Scheduled: 913.09ms, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [invoicetemplateid_58, lastinvoicedate_104, $hashvalue_180]
     Output partitioning: HASH [lastinvoicedate_104][$hashvalue_180]
     ScanFilterProject[table = owl:data.account, filterPredicate = (("invoicetemplateid_58" = VARCHAR '2c9081a03c638994013c63978baf002b') AND ("lastinvoicedate_104" < DATE '2013-01-22')), dynamicFilters = {"lastinvoicedate_104" = #df_1144}]
         Layout: [invoicetemplateid_58:varchar, lastinvoicedate_104:date, $hashvalue_180:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
         CPU: 89.00ms (7.06%), Scheduled: 911.00ms (10.56%), Blocked: 0.00ns (0.00%), Output: 0 rows (0B)
         Input avg.: 0.00 rows, Input std.dev.: ?%
         $hashvalue_180 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_104"), 0))
         lastinvoicedate_104 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
         invoicetemplateid_58 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
         Input: 0 rows (0B), Filtered: ?%
         Dynamic filters:
             - df_1144, NONE, collection time=12.04s

 Fragment 7 [SOURCE]
     CPU: 718.85ms, Scheduled: 4.69s, Blocked 0.00ns (Input: 0.00ns, Output: 0.00ns), Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00, Output: 0 rows (0B)
     Output layout: [lastinvoicedate_161, $hashvalue_183]
     Output partitioning: HASH [lastinvoicedate_161][$hashvalue_183]
     ScanFilterProject[table = owl:data.account, filterPredicate = (("invoicetemplateid_115" = VARCHAR '2c9081a03c638994013c63978baf002b') AND ("lastinvoicedate_161" < DATE '2013-01-22'))]
         Layout: [lastinvoicedate_161:date, $hashvalue_183:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu: ?, memory: 0B, network: 0B}
         CPU: 671.00ms (53.25%), Scheduled: 4.63s (53.69%), Blocked: 0.00ns (0.00%), Output: 0 rows (0B)
         Input avg.: 0.00 rows, Input std.dev.: ?%
         $hashvalue_183 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("lastinvoicedate_161"), 0))
         invoicetemplateid_115 := OwlColumnHandle{owlSupportedType=STRING, name=invoicetemplateid, mapping=InvoiceTemplateId, internal=false}
         lastinvoicedate_161 := OwlColumnHandle{owlSupportedType=DATE, name=lastinvoicedate, mapping=LastInvoiceDate, internal=false}
         Input: 0 rows (0B), Filtered: ?%
hashhar commented 11 months ago

What connector are you using?

I cannot reproduce this using the memory connector on either 355 or 389 or current master. All of them lead to empty result.

Aravindravi1209 commented 11 months ago

We are using our own custom connector!! Would it possible to figure out where the issue is by checking the EXPLAIN plan?

hashhar commented 11 months ago

Fragment 4 returns 0 rows in 389 while in older version it returned 1 row.

Since it's just a table scan with a filter there's likely some bug in your applyFilter implementation. It's not an optimizer bug since the filter is same in both cases, in one case there's a CAST present but the filter is equivalent.

Closing since this is not a bug in Trino.

Aravindravi1209 commented 11 months ago
@Override
    public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(
            ConnectorSession session,
            ConnectorTableHandle tableHandle,
            Constraint constraint) {

        OwlTableHandle owlTableHandle = (OwlTableHandle) tableHandle;

        // Create a new ConstraintApplicationResult with the original OwlTableHandle and Constraint summary
        ConstraintApplicationResult<ConnectorTableHandle> result =
                new ConstraintApplicationResult<>(owlTableHandle, constraint.getSummary(), false);

        owlTableHandle.setConstraintSummary(constraint.getSummary());

        // Return the result in an Optional
        return Optional.of(result);
    }

Hi @hashhar , thanks for your reply!! This is our implementation of the applyFilter method!!

In Trino Docs https://trino.io/docs/current/develop/connectors.html#predicate-pushdown it was mentioned that we had to implement the applyFilter method in our connectors metadata class.

Notes from Trino: However it is not possible to store a predicate in the table handle and use it later, as the predicate cannot be held on to after the applyFilter call returns. It is used for filtering of entire partitions, and is not pushed down. The summary can be pushed down instead by storing it in the table handle.

So I had implemented the applyFilter method and added getters/setters in OwlTableHandle so that we can persist and reuse the constraint summary.

We dont see any issues with basic queries having predicates and its working fine, we came across this issue with queries involving CTEs where the same CTE is being used along with predicates at multiple places!!

Is the way of storing and reusing the constraint summary in TableHandle which is built in PushPredicateIntoTableScan correct?