trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.88k stars 2.86k forks source link

Fix Oracle TimestampWithTimeZone mappings to handle DST correctly #7739

Open brandboat opened 3 years ago

brandboat commented 3 years ago

According to @hashhar's comment https://github.com/trinodb/trino/pull/7552#discussion_r619821632

There are two timestamp test cases in DST in testTimestampWithTimeZoneFromOracle() that couldn't pass when using SqlDataTypeTest.

For example, 2018-10-28 01:33:17.456 America/Bahia_Banderas is stored in oracle, when trino get it back, the value is 1540711997456 in millisecond, while in trino sql values ('2018-10-28 01:33:17.456 America/Bahia_Banderas') is 1540708397456 in millisecond. In SqlDataTypeTest, it use IS NOT DISTINCT FROM in test, since the millisecond between trino and oracle are totally different, test will fail.

hashhar commented 3 years ago

@brandboat Thanks for creating the issue. By the way I don't think it's a test issue. The type mapping for TimestampWithTimeZone seems to not handle DST correctly and hence values are not round-tripping properly between Trino and Oracle.

brandboat commented 3 years ago

By the way I don't think it's a test issue. The type mapping for TimestampWithTimeZone seems to not handle DST correctly and hence values are not round-tripping properly between Trino and Oracle.

Indeed, thanks for clarify that.

hashhar commented 3 years ago

Very weird behaviour.

The values inserted via Oracle and via Trino look the same in both Trino and Oracle. I added a breakpoint at verifySelect in SqlDataTypeTest#execute. Then ran insert into timestamp_tz_1qb9d values (TIMESTAMP '2018-10-28 01:33:17.456 America/Bahia_Banderas', TIMESTAMP '2018-10-28 03:33:33.333 Europe/Vilnius'); via Trino.

After that output follows:

# Oracle
SQL> select * from timestamp_tz_1qb9d;

COL_0
---------------------------------------------------------------------------
COL_1
---------------------------------------------------------------------------
28-OCT-18 01.33.17.456 AM AMERICA/BAHIA_BANDERAS
28-OCT-18 03.33.33.333 AM EUROPE/VILNIUS

28-OCT-18 01.33.17.456 AM AMERICA/BAHIA_BANDERAS
28-OCT-18 03.33.33.333 AM EUROPE/VILNIUS

# Trino
trino:trino_test> select * from timestamp_tz_1qb9d;
                     col_0                      |                 col_1                  
------------------------------------------------+----------------------------------------
 2018-10-28 01:33:17.456 America/Bahia_Banderas | 2018-10-28 03:33:33.333 Europe/Vilnius 
 2018-10-28 01:33:17.456 America/Bahia_Banderas | 2018-10-28 03:33:33.333 Europe/Vilnius 
(2 rows)

But if you start adding filters then Oracle can only filter the value it wrote while Trino can only filter the value it wrote - predicates don't match for the values which Trino inserted into Oracle or which Oracle inserted from Trino.

# Oracle
SQL> select * from timestamp_tz_1qb9d where col_0 = '28-OCT-18 01.33.17.456 AM AMERICA/BAHIA_BANDERAS';

COL_0
---------------------------------------------------------------------------
COL_1
---------------------------------------------------------------------------
28-OCT-18 01.33.17.456 AM AMERICA/BAHIA_BANDERAS
28-OCT-18 03.33.33.333 AM EUROPE/VILNIUS

# Trino
trino:trino_test> select * from timestamp_tz_1qb9d WHERE col_0 = TIMESTAMP '2018-10-28 01:33:17.456 America/Bahia_Banderas';
                     col_0                      |                 col_1                  
------------------------------------------------+----------------------------------------
 2018-10-28 01:33:17.456 America/Bahia_Banderas | 2018-10-28 03:33:33.333 Europe/Vilnius 
(1 row)
findepi commented 3 years ago

But if you start adding filters then Oracle can only filter the value it wrote while Trino can only filter the value it wrote - predicates don't match for the values which Trino inserted into Oracle or which Oracle inserted from Trino.

# Oracle
SQL> select * from timestamp_tz_1qb9d where col_0 = '28-OCT-18 01.33.17.456 AM AMERICA/BAHIA_BANDERAS';

This compares col_0 (timestamp with time zone?) with a varchar. What is the inferred type of the comparison? Is varchar coerced to timestamp tz, or vice versa? Did you try using from_tz(TIMESTAMP '28-OCT-18 01.33.17.456', 'America/Bahia_Banderas') go get a timestamp with time zone "literal" in Oracle?

Also, what are your NLS session settings? i.e. how does your Oracle command line format returned values? e.g. Does it include their full precision?