Open lupengfei-johnson opened 3 years ago
Hi !! In Kudu connector UNIXTIME_MICROS
is mapped to TIMESTAMP
type in Trino
TIMESTAMP does not represent specific point in time, but rather a reading of a wall clock+calendar. Selecting values from TIMESTAMP column should return same result set no matter what is the client's session timezone.
I guess the result remains the same irrespective of the timezone set in the client
@lupengfei-johnson if Kudu's UNIXTIME_MICROS
is supposed to represent a point in time (as the name suggests, but better double check than assume), we should map it to timestamp with time zone
.
Unfortunately, this is all Kudu documentation has to say about it:
unixtime_micros (64-bit microseconds since the Unix epoch)
It doesn't explain the semantics of the type, just its encoding. In Impala, they've mapped it to TIMESTAMP, but I've seen some discussions where people expressed similar concerns, so it may make more sense to map it to TIMESTAMP WITH TIME ZONE.
@findepi I expect that the values of Kudu's UNIXTIME_MICROS is mapped to timestamp in Asia/Shanghai time zone , because our sql scripts used Asia/Shanghai time to write sql like where event_time = timestamp '2021-01-20 00:00:00'
, our sql scripts work well in presto-server-0.228 , but can't work in trino-server-351, I plan to reslove it by adding a kudu.timezone
configuration in etc/catalog/kudu.properties , and converting Kudu's UNIXTIME_MICROS
to time at kudu.timezone
when KuduRecordCursor
reads data from Kudu .
IF value represents a point in time, the connector should return it as such. From what i see above it seems Kudu UNIXTIME_MICROS is a data point, but i am certainly not a Kudu expert.
cc @lzeiming
We also find the timestamp value from our Kudu cluster in the query result is not compatible when we upgrade from 332 to 347 We have to keep the compatibility of our system, so we modified the Kudu connector. We invoke convertUTCToLocal of our system's timezone when reading from Kudu as Hive connector reading ORC file. We also invoke convertLocalToUTC when writing into Kudu. I think we could add a config property like orcLegacyTimeZone in HiveConfig.
Based on the discussion here and elsewhere, it seems to me that we should map this type to timestamp with time zone
instead of timestamp
.
In presto-server-0.228, time zone=Asia/Shanghai , the result is
2021-01-28 11:32:41.000 2021-01-28 11:32:41.000 2021-01-28 11:32:41.000 2021-01-28 11:32:38.000 2021-01-28 11:32:38.000 2021-01-28 11:32:38.000 2021-01-28 11:31:41.000 2021-01-28 11:31:41.000 2021-01-28 11:31:41.000 2021-01-28 11:31:38.000 (10 rows)
In trino-server-351, time zone=Asia/Shanghai , the result is
2021-01-28 03:32:41.000 2021-01-28 03:32:41.000 2021-01-28 03:32:41.000 2021-01-28 03:32:38.000 2021-01-28 03:32:38.000 2021-01-28 03:32:38.000 2021-01-28 03:31:41.000 2021-01-28 03:31:41.000 2021-01-28 03:31:41.000 2021-01-28 03:31:38.000 (10 rows)
The data in kudu is UTC time, in trino-server-351 ,the result is UTC time even though I used time zone=Asia/Shanghai