trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.33k stars 2.97k forks source link

Unwrap cast of timestamp with normalized zone to date in comparison #12729

Open erichwang opened 2 years ago

erichwang commented 2 years ago

Given a SQL predicate that looks like:

CAST(timestamp_with_tz AT TIME ZONE '<some_tz>' AS DATE) > DATE '2022-01-01'

we should unwrap it to the form:

timestamp_with_tz > TIMESTAMP '<some_time>' AT TIME ZONE '<some_tz>'

-- and similarly for an upperbound. This will allow timestamps with timezones to be passed easily through the SPI in the TupleDomain format for easier connector consumption.

The reason we need to add AT TIME ZONE <some_tz> is to ensure that we are comparing against the same DATE literal instant. Otherwise the comparison can't be unwrapped if the column has a bunch of values with different time zones (since the DATEs for them would all refer to different timestamp instances).

Why? Timestamp with TZ is a very common storage format for computer event time, particularly those formats using fixed hour offsets (e.g. '-07:00'), and it is very common to want to push this predicate down into connectors for data pruning.

findepi commented 2 years ago

CAST(timestamp_with_tz AS DATE) > DATE '2022-01-01'

per https://github.com/trinodb/trino/issues/12362#issuecomment-1138319041, unwrapping this is incorrect

CAST(timestamp_with_tz AT TIMEZONE 'UTC' AS DATE) > DATE '2022-01-01'

this we can unwrap.

findepi commented 2 years ago

CAST(timestamp_with_tz AS DATE) > DATE '2022-01-01'

per #12362 (comment), unwrapping this is incorrect

It's correct only when we know zone of all the values. For example, in Iceberg we know all timestamp with time zone values are in UTC, so we can unwrap within connector: https://github.com/trinodb/trino/pull/12918

CAST(timestamp_with_tz AT TIMEZONE 'UTC' AS DATE) > DATE '2022-01-01'

this we can unwrap.

thanks for editing the issue description to keep this only. I updated the issue title as well.