trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.42k stars 3k forks source link

Redshift connector cannot read timestamp columns in >v341 #6607

Closed grantatspothero closed 1 year ago

grantatspothero commented 3 years ago

Querying columns of type timestamp fails in trino versions >v345.

--created is a timestamp column in redshift
select created 
from redshift.schema.table

with the stacktrace:

io.prestosql.spi.PrestoException: Could not serialize column 'created' of type 'timestamp(3)' at position 1:1
    ...
Caused by: java.lang.IllegalArgumentException: Expected 0s for digits beyond precision 3: epochMicros = 1545228016441221
    at io.prestosql.spi.type.SqlTimestamp.newInstance(SqlTimestamp.java:51)
    at io.prestosql.spi.type.ShortTimestampType.getObjectValue(ShortTimestampType.java:125)
    at io.prestosql.server.protocol.QueryResultRows.getRowValues(QueryResultRows.java:176)
    ... 44 more

I think this is because the redshift connector uses the basejdbc connector, which maps the timestamp redshift type to the TIMESTAMP_MILLIS trino timestamp type. However, the redshift timestamp type has 6 digits of precision (not the 3 trino is expecting).

https://github.com/trinodb/trino/blob/2bcf5d9e2d2642c5411e02c67e16f453fdff423a/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java#L575-L577

https://docs.aws.amazon.com/redshift/latest/dg/r_Datetime_types.html#r_Datetime_types-timestamp

This started causing queries to fail in v346 when this PR was introduced: https://github.com/trinodb/trino/pull/5742 Previously SqlTimestamp. newInstance would round the timestamp when the precision did not match the expected precision, now it explicitly fails.

I think the fix should just be overriding toColumnMapping in the redshift client to handle timestamps with 6 digits of precision.

See this thread for context: https://trinodb.slack.com/archives/CGB0QHWSW/p1610645172011400

findepi commented 3 years ago

There is lack of rounding in the default column mapping being returned here

https://github.com/trinodb/trino/blob/03d8e5abc686c5c4a9f96ca14db07e2aed880174/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java#L577 which is

https://github.com/trinodb/trino/blob/03d8e5abc686c5c4a9f96ca14db07e2aed880174/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java#L492-L498

The code went in 342. In 346 we probably added validation (didn't check exact), but the behavior in 342-345 was silently incorrect leading to incorrect query results.

grantatspothero commented 3 years ago

@findepi when you say "silently leading to incorrect query results" do you mean "silently rounding the precision of the timestamp" or do you mean "completely wrong timestamp values". Just trying to grasp the severity of the bug.

findepi commented 3 years ago

@grantatspothero for just SELECT queries it was returning correct results but if the value was compared with something, or aggregated, it could return incorrect ones. Of course, i am not saying this is the common case for timestamps, i am just pointing out https://github.com/trinodb/trino/pull/5742 is not to the root cause.

tooptoop4 commented 2 years ago

clickhouse and druid also using https://github.com/trinodb/trino/blob/388/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/StandardColumnMappings.java#L535

ebyhr commented 1 year ago

Closing as #15365