trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.14k stars 2.92k forks source link

Incorrect predicate creation for: date (day_partition) >= date_add('day', - 30, CURRENT_DATE) #19097

Open vburenin opened 11 months ago

vburenin commented 11 months ago

Updated problem statement.

It appears that there is a big difference in how hive, hudi and iceberg optimizers work. Running this query on the same table represented as hudi table and iceberg table gives very different optimizer output. Tables are partitioned by day_partition field that has a format 2023-09-15.

Query for Hudi:
explain SELECT object_id FROM hudi_data.vburenin_dev.test_hudi WHERE date(day_partition) >= date_add('day', - 30, CURRENT_DATE) group by 1;

Query for Iceberg:
explain SELECT object_id FROM hudi_data.vburenin_dev.test_iceberg WHERE date(day_partition) >= date_add('day', - 30, CURRENT_DATE) group by 1;

On 419 with hive connector and tiny mod to read hudi tables:


     │      day_partition := day_partition:string:PARTITION_KEY                                                                                   >
     │          :: [[2023-08-21], [2023-08-22], [2023-08-23], [2023-08-24], [2023-08-25], [2023-08-26], [2023-08-27], [2023-08-28], [2023-08-29], >
     │      created_at := created_at:bigint:REGULAR    

On 426 with Hudi connector:

    │      day_partition := day_partition:string:PARTITION_KEY        

On Iceberg for 419 and 426:

 day_partition := 325:day_partition:varchar                                                                                            >
                :: [(<min>, 1), (2022, <max>)]                                                                                                    >  

This particular behavior greatly affects the cost at the large scale due to more of LIST operations for Hudi and more of GetObject calls.

----------------------------------------------------

Original problem statement: I have a bunch of hudi and iceberg tables partitioned by varchar day_partition field in the following format: 2023-09-19 When I supply this WHERE condition: DATE (day_partition) >= Date_add('day', - 30, CURRENT_DATE) It appears that the filtering predicate is calculated incorrectly as it looks like this:

day_partition := 21:day_partition:varchar :: [(, 1), (2022, )]

The issue happened after Trino 419, I have run my tests on Trino 426.

Initially I spotted it when I noticed that Hudi connector is looking up too many partitions even though they are way out of scope.

electrum commented 11 months ago

Predicate pushdown generally requires that the partition column be used directly in the comparison, not inside a function. Try changing your query to something like this:


WHERE day_partition >= CAST(date_add('day', -30, CURRENT_DATE) AS varchar)
martint commented 11 months ago

Predicate pushdown generally requires that the partition column be used directly in the comparison, not inside a function.

For this case, there's an optimization that unwraps the column reference from inside the date(...) expression, which is really just syntactic sugar for a cast from varchar to date. See https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html.

vburenin commented 11 months ago

This issue appears like a regression as it worked in Trino 419. While changing the SQL query is not a big of an issue, changing a few thousands of them scattered around userbase might pose a serious challenge.

martint commented 11 months ago

@vburenin, are you able to pinpoint exactly which version introduced the problem?

vburenin commented 11 months ago

@martint Unfortunately no. I jumped from 419 to 426 and spotted the problem only yesterdays when I added my own custom metadata handling into hudi connector.

findepi commented 11 months ago

For this case, there's an optimization that unwraps the column reference from inside the date(...) expression, which is really just syntactic sugar for a cast from varchar to date. See https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html.

The blog post is awesome. Note that it predates the optimization that you're observing at work. It was implemented in https://github.com/trinodb/trino/pull/13567, so Trino 394. I am not aware of any changes in this area between 419 and 426.

when I added my own custom metadata handling into hudi connector.

@vburenin i think we can consider the issue actionable, if there is a problem with vanilla Trino, without any custom modifications. Are you seeing a difference on vanilla Trino Iceberg connector between version 419 and 426? If so, can you please share output of EXPLAIN SELECT .. FROM your_table DATE (day_partition) >= Date_add('day', - 30, CURRENT_DATE) from both these versions?

vburenin commented 11 months ago

I am planning to poke around to pinpoint a version during the day today CT. I wish I was familiar with Trino code base to debug it deeper.

On Wed, Sep 20, 2023 at 3:35 AM Piotr Findeisen @.***> wrote:

For this case, there's an optimization that unwraps the column reference from inside the date(...) expression, which is really just syntactic sugar for a cast from varchar to date. See https://trino.io/blog/2019/05/21/optimizing-the-casts-away.html.

The blog post is awesome. Note that it predates the optimization that you're observing at work. It was implemented in #13567 https://github.com/trinodb/trino/pull/13567, so Trino 394. I am not aware of any changes in this area between 419 and 426.

when I added my own custom metadata handling into hudi connector.

@vburenin https://github.com/vburenin i think we can consider the issue actionable, if there is a problem with vanilla Trino, without any custom modifications. Are you seeing a difference on vanilla Trino Iceberg connector between version 419 and 426? If so, can you please share output of EXPLAIN SELECT .. FROM your_table DATE (day_partition) >= Date_add('day', - 30, CURRENT_DATE) from both these versions?

— Reply to this email directly, view it on GitHub https://github.com/trinodb/trino/issues/19097#issuecomment-1727238888, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBGHKZ7V3GDWMSUBWBK263X3KTGJANCNFSM6AAAAAA462VSBM . You are receiving this because you were mentioned.Message ID: @.***>

vburenin commented 11 months ago

I am certainly mistaken on the version it is introduced, I have dug out a symptom in a combination of hive(to read hudi, i had to mode hive connector)+google sheets connector that lead me to the reproduction of the issue in iceberg, but that is apparently not the root cause. I will update the bug description once I narrow my search down more.

vburenin commented 11 months ago

Here is more info from the SQL explain that lead me to believe it is a problem of the predicated:

explain SELECT object_id FROM hudi_data.vburenin_dev.test_hudi WHERE date(day_partition) >= date_add('day', - 30, CURRENT_DATE) group by 1;

On 419 with hive connector and tiny mod to read hudi table:


     │      day_partition := day_partition:string:PARTITION_KEY                                                                                   >
     │          :: [[2023-08-21], [2023-08-22], [2023-08-23], [2023-08-24], [2023-08-25], [2023-08-26], [2023-08-27], [2023-08-28], [2023-08-29], >
     │      created_at := created_at:bigint:REGULAR    

On 426:

    │      day_partition := day_partition:string:PARTITION_KEY        
findepi commented 11 months ago

@vburenin yes this looks like lack of partition pruning for day_partition in 426.

unfortunately, i don't know much about Hudi. Are you able to repro the problem with Iceberg, Hive or Delta? Then perhaps I might be able to help

if not, let me cc someone more knowledgable about Hudi -- @codope

vburenin commented 11 months ago

@findepi on the exactly the same partitioned table, basically a hudi table copy into Iceberg EXPLAIN doesn't look promising:

            day_partition := 325:day_partition:varchar                                                                                            >
                :: [(<min>, 1), (2022, <max>)]                                                                                                    >

Tested on 419 and 426. It appears like Hive connector is better equipped with partition pruning.

UPDATED: I can't use Hive connector anymore to query hudi tables on Trino 426