Open sbrackenbury-teranet opened 6 months ago
In Trino, timestamp with time zone
is rendered with its stored time zone. This is intentional, as the insertion time zone is considered to have meaning. Readers can choose to render it in a different time zone via at time zone
. The session time zone is used for conversions between timestamp
and timestamp with time zone
inside the query engine, as described by the SQL specification.
There are roughly three phases of processing when reading timestamp values that need to be considered:
Changing the representation after the value is read from storage is problematic. It makes it impossible to observe the value as it was stored, with its original time zone. Changing the representation when the value is transported to the client requires changes to the protocol to allow clients to express such preference. Adjusting the rendering in the client requires each client to be able to do so independently.
As an aside, Iceberg's timestamp with time zone type is stored in UTC, and the timezone information is lost. From the Iceberg spec (https://iceberg.apache.org/spec/#primitive-types):
Timestamp values with time zone represent a point in time: values are stored as UTC and do not retain a source time zone (2017-11-16 17:10:34 PST is stored/retrieved as 2017-11-17 01:10:34 UTC and these values are considered identical).
First of all, thank-you @martint for taking the time to review and consider this issue and our PR.
After reviewing your comments and doing further testing of our PR. We recognize we need to revisit our code and the PR submission.
We still have a strong preference for the rendering/display implementation of timestamp with time zone
on the server side and not client side precisely because, to your point, adjusting the rendering/display on the client would be challenging. We prefer a configurable behavior on the server side so as to be transparent to the client, while supporting backward compatibility.
A Trino query against an iceberg table timestamp column renders the query result differently from how Spark renders the query result for the same iceberg table:
Spark-SQL:
Trino SQL (against the same Spark generated iceberg table above):
Postgres rendering behavior for a table with a timestamptz column is like Spark:
The only option available in Trino for rendering the timestamp column at the desired timezone is to employ the Trino at_timezone function. This is undesirable as it requires a code change to existing queries. Adding the ability for Trino to transparently render the Timestamp based on a set/configured Timezone similar to how Spark and Postgres behaves is preferred because it is less intrusive.
This PR introduces configurable Trino query behavior for rendering Iceberg table timestamp columns in the same manner that Spark and Postgres render timestamp with time zone data type columns. It simplifies and encourages the adoption of Iceberg format tables that have columns of type timestamp with time zone.
The enhancement ensures that an Iceberg table timestamp column value:
1) Is normalized to UTC time 2) Is rendered at query time according to a default or specified session Time Zone