trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.46k stars 3.01k forks source link

use kudu connector to query values of UNIXTIME_MICROS type, time zone didn't take effect #6744

Open lupengfei-johnson opened 3 years ago

lupengfei-johnson commented 3 years ago

In presto-server-0.228, time zone=Asia/Shanghai , the result is
2021-01-28 11:32:41.000 2021-01-28 11:32:41.000 2021-01-28 11:32:41.000 2021-01-28 11:32:38.000 2021-01-28 11:32:38.000 2021-01-28 11:32:38.000 2021-01-28 11:31:41.000 2021-01-28 11:31:41.000 2021-01-28 11:31:41.000 2021-01-28 11:31:38.000 (10 rows)

In trino-server-351, time zone=Asia/Shanghai , the result is
2021-01-28 03:32:41.000 2021-01-28 03:32:41.000 2021-01-28 03:32:41.000 2021-01-28 03:32:38.000 2021-01-28 03:32:38.000 2021-01-28 03:32:38.000 2021-01-28 03:31:41.000 2021-01-28 03:31:41.000 2021-01-28 03:31:41.000 2021-01-28 03:31:38.000 (10 rows)

The data in kudu is UTC time, in trino-server-351 ,the result is UTC time even though I used time zone=Asia/Shanghai

Praveen2112 commented 3 years ago

Hi !! In Kudu connector UNIXTIME_MICROS is mapped to TIMESTAMP type in Trino

TIMESTAMP does not represent specific point in time, but rather a reading of a wall clock+calendar. Selecting values from TIMESTAMP column should return same result set no matter what is the client's session timezone.

I guess the result remains the same irrespective of the timezone set in the client

Ref : https://github.com/trinodb/trino/issues/37

findepi commented 3 years ago

@lupengfei-johnson if Kudu's UNIXTIME_MICROS is supposed to represent a point in time (as the name suggests, but better double check than assume), we should map it to timestamp with time zone.

martint commented 3 years ago

Unfortunately, this is all Kudu documentation has to say about it:

unixtime_micros (64-bit microseconds since the Unix epoch)

It doesn't explain the semantics of the type, just its encoding. In Impala, they've mapped it to TIMESTAMP, but I've seen some discussions where people expressed similar concerns, so it may make more sense to map it to TIMESTAMP WITH TIME ZONE.

lupengfei-johnson commented 3 years ago

@findepi I expect that the values of Kudu's UNIXTIME_MICROS is mapped to timestamp in Asia/Shanghai time zone , because our sql scripts used Asia/Shanghai time to write sql like where event_time = timestamp '2021-01-20 00:00:00' , our sql scripts work well in presto-server-0.228 , but can't work in trino-server-351, I plan to reslove it by adding a kudu.timezone configuration in etc/catalog/kudu.properties , and converting Kudu's UNIXTIME_MICROS to time at kudu.timezone when KuduRecordCursor reads data from Kudu .

findepi commented 3 years ago

IF value represents a point in time, the connector should return it as such. From what i see above it seems Kudu UNIXTIME_MICROS is a data point, but i am certainly not a Kudu expert.

cc @lzeiming

lzeiming commented 3 years ago

We also find the timestamp value from our Kudu cluster in the query result is not compatible when we upgrade from 332 to 347 We have to keep the compatibility of our system, so we modified the Kudu connector. We invoke convertUTCToLocal of our system's timezone when reading from Kudu as Hive connector reading ORC file. We also invoke convertLocalToUTC when writing into Kudu. I think we could add a config property like orcLegacyTimeZone in HiveConfig.

martint commented 3 years ago

Based on the discussion here and elsewhere, it seems to me that we should map this type to timestamp with time zone instead of timestamp.