trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.18k stars 2.94k forks source link

Hive connector seems to change precision of timestamp. #5564

Open luhhujbb opened 3 years ago

luhhujbb commented 3 years ago

I'm not sure it's an issue, and maybe somewhere in the documentation it's written but I don't find it.

Presto version

All version with new timestamp design with variable precision

Issue

I have a lots of tables with seconds precision timestamp, ie timestamp(0)

As mentionned here #37 : "Typically databases store it internally as seconds since epoch in some fixed timezone (usually UTC)", timestamp are usually in seconds, precision is optionnal.

As mentionned here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types, hive timestamp type supports traditional UNIX timestamp with optional nanosecond precision. I don't know if is up to date with recent hive version.

So maybe I've missed something in documentation, but I feel hive connector doesn't support Timestamp ie Timestamp(0) anymore. In recent version only timestamp_millis (Timestamp(3)) and timestamp_nanos (Timestamp(6)).

Is basic timestamp, ie timestamp(0) dropped from hive connector by design in recent version, or is it a bug ?

Many thanks !

findepi commented 3 years ago

hive connector doesn't support Timestamp ie Timestamp(0) anymore

i think it was never supported. do you have particular SQL commands in mind, that used to work and they no longer do?

luhhujbb commented 3 years ago

In version 335

SELECT to_unixtime(ts) as unix_s,ts FROM rtb.bid_requests where c='fr' and d='10' and y='2020' and h='10' limit 10; unix_s | ts ------------------+------------------------- 1.60232215602E9 | 2020-10-10 11:29:16.020 1.602322162016E9 | 2020-10-10 11:29:22.016 1.602322150078E9 | 2020-10-10 11:29:10.078 1.602322153594E9 | 2020-10-10 11:29:13.594 1.602322161401E9 | 2020-10-10 11:29:21.401 1.602322164524E9 | 2020-10-10 11:29:24.524 1.602322150503E9 | 2020-10-10 11:29:10.503 1.602322175869E9 | 2020-10-10 11:29:35.869 1.602322140124E9 | 2020-10-10 11:29:00.124 1.602322169268E9 | 2020-10-10 11:29:29.268

Timestamps are in milliseconds. I'm tired, so it's not a matter of seconds sorry.

ts has datatype timestamp(3) in this version.

In version 344

I've finally set up another cluster and here are the result on latest version

SELECT to_unixtime(ts) as unix_s,ts FROM rtb.bid_requests where c='fr' and d='10' and y='2020' and h='10' limit 10; unix_s | ts ----------------+------------------------- 1602321.242367 | 1970-01-19 13:05:21.242 1602321.722273 | 1970-01-19 13:05:21.722 1602322.458844 | 1970-01-19 13:05:22.459 1602322.630645 | 1970-01-19 13:05:22.631 1602321.730594 | 1970-01-19 13:05:21.731 1602321.735642 | 1970-01-19 13:05:21.736 1602321.739366 | 1970-01-19 13:05:21.739 1602321.745131 | 1970-01-19 13:05:21.745 1602322.632744 | 1970-01-19 13:05:22.633 1602322.645291 | 1970-01-19 13:05:22.645

ts has also datatype timestamp(3) in this version

luhhujbb commented 3 years ago

So to recap,

martint commented 3 years ago

What file format are you using to store your data? This is probably related to https://github.com/prestosql/presto/pull/4974. It's possible we missed a code path for one of the formats.

luhhujbb commented 3 years ago

We use parquet files.

martint commented 3 years ago

Would you mind posting the output of parquet-tools meta for one of those files? You can get that tool from:

luhhujbb commented 3 years ago

Hi, here is the result of parquet-tools meta :


file schema:             spark_schema
--------------------------------------------------------------------------------
ts:                      OPTIONAL INT64 R:0 D:1
version:                 OPTIONAL BINARY O:UTF8 R:0 D:1
request_id:              OPTIONAL BINARY O:UTF8 R:0 D:1
cat:                     OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL BINARY O:UTF8 R:1 D:3
app_bundle:              OPTIONAL BINARY O:UTF8 R:0 D:1
app_domain:              OPTIONAL BINARY O:UTF8 R:0 D:1
app_id:                  OPTIONAL BINARY O:UTF8 R:0 D:1
app_name:                OPTIONAL BINARY O:UTF8 R:0 D:1
site_domain:             OPTIONAL BINARY O:UTF8 R:0 D:1
site_id:                 OPTIONAL BINARY O:UTF8 R:0 D:1
site_name:               OPTIONAL BINARY O:UTF8 R:0 D:1
site_page:               OPTIONAL BINARY O:UTF8 R:0 D:1
site_cleaned_page:       OPTIONAL BINARY O:UTF8 R:0 D:1
supply:                  OPTIONAL BINARY O:UTF8 R:0 D:1
supply_tags_ids:         OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL INT32 O:INT_16 R:1 D:3
supply_bundle:           OPTIONAL BINARY O:UTF8 R:0 D:1
supply_domain:           OPTIONAL BINARY O:UTF8 R:0 D:1
supply_id:               OPTIONAL BINARY O:UTF8 R:0 D:1
supply_name:             OPTIONAL BINARY O:UTF8 R:0 D:1
supply_page:             OPTIONAL BINARY O:UTF8 R:0 D:1
supply_store_url:        OPTIONAL BINARY O:UTF8 R:0 D:1
devicetype:              OPTIONAL INT32 O:INT_8 R:0 D:1
connectiontype:          OPTIONAL INT32 O:INT_8 R:0 D:1
carrier:                 OPTIONAL BINARY O:UTF8 R:0 D:1
language:                OPTIONAL BINARY O:UTF8 R:0 D:1
make:                    OPTIONAL BINARY O:UTF8 R:0 D:1
model:                   OPTIONAL BINARY O:UTF8 R:0 D:1
osv:                     OPTIONAL BINARY O:UTF8 R:0 D:1
hwv:                     OPTIONAL BINARY O:UTF8 R:0 D:1
country:                 OPTIONAL BINARY O:UTF8 R:0 D:1
city:                    OPTIONAL BINARY O:UTF8 R:0 D:1
zip:                     OPTIONAL BINARY O:UTF8 R:0 D:1
geotype:                 OPTIONAL INT32 O:INT_8 R:0 D:1
lat:                     OPTIONAL DOUBLE R:0 D:1
lon:                     OPTIONAL DOUBLE R:0 D:1
sdk_lat:                 OPTIONAL DOUBLE R:0 D:1
sdk_lon:                 OPTIONAL DOUBLE R:0 D:1
inferred_lat:            OPTIONAL DOUBLE R:0 D:1
inferred_lon:            OPTIONAL DOUBLE R:0 D:1
ua:                      OPTIONAL BINARY O:UTF8 R:0 D:1
browser:                 OPTIONAL BINARY O:UTF8 R:0 D:1
idfa:                    OPTIONAL BINARY O:UTF8 R:0 D:1
ssp:                     OPTIONAL BINARY O:UTF8 R:0 D:1
ssp_iso:                 OPTIONAL BINARY O:UTF8 R:0 D:1
user_id:                 OPTIONAL BINARY O:UTF8 R:0 D:1
gender:                  OPTIONAL BINARY O:UTF8 R:0 D:1
yob:                     OPTIONAL INT32 O:INT_16 R:0 D:1
buyeruid:                OPTIONAL BINARY O:UTF8 R:0 D:1
ip:                      OPTIONAL BINARY O:UTF8 R:0 D:1
ipv6:                    OPTIONAL BINARY O:UTF8 R:0 D:1
displaytype:             OPTIONAL INT32 O:INT_8 R:0 D:1
height:                  OPTIONAL INT32 O:INT_16 R:0 D:1
width:                   OPTIONAL INT32 O:INT_16 R:0 D:1
creative_formats:        OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL INT32 O:INT_8 R:1 D:3
screen_position:         OPTIONAL INT32 R:0 D:1
formats:                 OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL F:2
...w:                    OPTIONAL INT32 O:INT_16 R:1 D:4
...h:                    OPTIONAL INT32 O:INT_16 R:1 D:4
clickbrowser:            OPTIONAL INT32 O:INT_8 R:0 D:1
imp_metrics:             OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL F:3
...vendor:               OPTIONAL BINARY O:UTF8 R:1 D:4
...type:                 OPTIONAL BINARY O:UTF8 R:1 D:4
...value:                OPTIONAL FLOAT R:1 D:4
bidfloor:                OPTIONAL FLOAT R:0 D:1
bidfloorcur:             OPTIONAL BINARY O:UTF8 R:0 D:1
deals_at:                OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL INT32 O:INT_8 R:1 D:3
deals_bidfloor:          OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL FLOAT R:1 D:3
deals_bidfloorcur:       OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL BINARY O:UTF8 R:1 D:3
deals_id:                OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL BINARY O:UTF8 R:1 D:3
coppa:                   OPTIONAL BINARY O:UTF8 R:0 D:1
google_segments:         OPTIONAL F:1
.key_value:              REPEATED F:2
..key:                   REQUIRED BINARY O:UTF8 R:1 D:2
..value:                 OPTIONAL FLOAT R:1 D:3
ias_tags:                OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL BINARY O:UTF8 R:1 D:3
publisher_id:            OPTIONAL BINARY O:UTF8 R:0 D:1
api:                     OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL INT32 O:INT_8 R:1 D:3
sdk_version:             OPTIONAL BINARY O:UTF8 R:0 D:1
protocols:               OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL INT32 O:INT_8 R:1 D:3
at:                      OPTIONAL INT32 O:INT_8 R:0 D:1
gdpr_flag:               OPTIONAL INT32 O:INT_8 R:0 D:1
is_adot_vendor_allowed:  OPTIONAL BOOLEAN R:0 D:1
is_adot_purpose_allowed: OPTIONAL BOOLEAN R:0 D:1
allowed_purposes:        OPTIONAL F:1
.list:                   REPEATED F:1
..element:               OPTIONAL INT32 R:1 D:3
user_consent:            OPTIONAL BOOLEAN R:0 D:1
bidding_user_consent:    OPTIONAL BOOLEAN R:0 D:1
is_european_country:     OPTIONAL BOOLEAN R:0 D:1
encoded_consent_string:  OPTIONAL BINARY O:UTF8 R:0 D:1
is_bid:                  OPTIONAL BOOLEAN R:0 D:1
is_multi_imp:            OPTIONAL BOOLEAN R:0 D:1
imp_index:               OPTIONAL INT32 R:0 D:1
imp_id:                  OPTIONAL BINARY O:UTF8 R:0 D:1

row group 1:             RC:693063 TS:391123651 OFFSET:4
--------------------------------------------------------------------------------
ts:                       INT64 GZIP DO:0 FPO:4 SZ:2113638/5523432/2.61 VC:693063 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
version:                  BINARY GZIP DO:0 FPO:2113642 SZ:282/191/0.68 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
request_id:               BINARY GZIP DO:0 FPO:2113924 SZ:11659249/24905038/2.14 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
cat:
.list:
..element:                BINARY GZIP DO:0 FPO:13773173 SZ:692493/2379819/3.44 VC:2043703 ENC:PLAIN_DICTIONARY,RLE
app_bundle:               BINARY GZIP DO:0 FPO:14465666 SZ:593660/808725/1.36 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
app_domain:               BINARY GZIP DO:0 FPO:15059326 SZ:329154/501714/1.52 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
app_id:                   BINARY GZIP DO:0 FPO:15388480 SZ:676109/882817/1.31 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
app_name:                 BINARY GZIP DO:0 FPO:16064589 SZ:676064/904610/1.34 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
site_domain:              BINARY GZIP DO:0 FPO:16740653 SZ:53/33/0.62 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
site_id:                  BINARY GZIP DO:0 FPO:16740706 SZ:53/33/0.62 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
site_name:                BINARY GZIP DO:0 FPO:16740759 SZ:53/33/0.62 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
site_page:                BINARY GZIP DO:0 FPO:16740812 SZ:53/33/0.62 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
site_cleaned_page:        BINARY GZIP DO:0 FPO:16740865 SZ:53/33/0.62 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
supply:                   BINARY GZIP DO:0 FPO:16740918 SZ:462020/730043/1.58 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_tags_ids:
.list:
..element:                INT32 GZIP DO:0 FPO:17202938 SZ:55/41/0.75 VC:693063 ENC:PLAIN,RLE
supply_bundle:            BINARY GZIP DO:0 FPO:17202993 SZ:593660/808725/1.36 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_domain:            BINARY GZIP DO:0 FPO:17796653 SZ:329154/501714/1.52 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_id:                BINARY GZIP DO:0 FPO:18125807 SZ:676109/882817/1.31 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_name:              BINARY GZIP DO:0 FPO:18801916 SZ:676064/904610/1.34 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_page:              BINARY GZIP DO:0 FPO:19477980 SZ:216/192/0.89 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_store_url:         BINARY GZIP DO:0 FPO:19478196 SZ:574182/791918/1.38 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
devicetype:               INT32 GZIP DO:0 FPO:20052378 SZ:123281/253764/2.06 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
connectiontype:           INT32 GZIP DO:0 FPO:20175659 SZ:239864/328504/1.37 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
carrier:                  BINARY GZIP DO:0 FPO:20415523 SZ:237070/393918/1.66 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
language:                 BINARY GZIP DO:0 FPO:20652593 SZ:161195/300588/1.86 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
make:                     BINARY GZIP DO:0 FPO:20813788 SZ:298946/747597/2.50 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
model:                    BINARY GZIP DO:0 FPO:21112734 SZ:848608/1040939/1.23 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
osv:                      BINARY GZIP DO:0 FPO:21961342 SZ:287478/528855/1.84 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
hwv:                      BINARY GZIP DO:0 FPO:22248820 SZ:53/33/0.62 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
country:                  BINARY GZIP DO:0 FPO:22248873 SZ:282/191/0.68 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
city:                     BINARY GZIP DO:0 FPO:22249155 SZ:1247417/1590988/1.28 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
zip:                      BINARY GZIP DO:0 FPO:23496572 SZ:1081814/1211078/1.12 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
geotype:                  INT32 GZIP DO:0 FPO:24578386 SZ:147597/187812/1.27 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
lat:                      DOUBLE GZIP DO:0 FPO:24725983 SZ:1429611/1695686/1.19 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
lon:                      DOUBLE GZIP DO:0 FPO:26155594 SZ:1552746/1734902/1.12 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
sdk_lat:                  DOUBLE GZIP DO:0 FPO:27708340 SZ:23013/31927/1.39 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
sdk_lon:                  DOUBLE GZIP DO:0 FPO:27731353 SZ:23670/31943/1.35 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
inferred_lat:             DOUBLE GZIP DO:0 FPO:27755023 SZ:1746219/2096481/1.20 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
inferred_lon:             DOUBLE GZIP DO:0 FPO:29501242 SZ:1874308/2138945/1.14 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
ua:                       BINARY GZIP DO:0 FPO:31375550 SZ:5424446/74765224/13.78 VC:693063 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
browser:                  BINARY GZIP DO:0 FPO:36799996 SZ:120015/137780/1.15 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
idfa:                     BINARY GZIP DO:0 FPO:36920011 SZ:14134491/27248834/1.93 VC:693063 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
ssp:                      BINARY GZIP DO:0 FPO:51054502 SZ:226738/348140/1.54 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
ssp_iso:                  BINARY GZIP DO:0 FPO:51281240 SZ:226253/347955/1.54 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
user_id:                  BINARY GZIP DO:0 FPO:51507493 SZ:9084638/16758875/1.84 VC:693063 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
gender:                   BINARY GZIP DO:0 FPO:60592131 SZ:7005/15616/2.23 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
yob:                      INT32 GZIP DO:0 FPO:60599136 SZ:8418/15739/1.87 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
buyeruid:                 BINARY GZIP DO:0 FPO:60607554 SZ:9121160/16812413/1.84 VC:693063 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
ip:                       BINARY GZIP DO:0 FPO:69728714 SZ:3189516/9557716/3.00 VC:693063 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
ipv6:                     BINARY GZIP DO:0 FPO:72918230 SZ:8844/23573/2.67 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
displaytype:              INT32 GZIP DO:0 FPO:72927074 SZ:96402/166613/1.73 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
height:                   INT32 GZIP DO:0 FPO:73023476 SZ:358959/823528/2.29 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
width:                    INT32 GZIP DO:0 FPO:73382435 SZ:253620/757815/2.99 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
creative_formats:
.list:
..element:                INT32 GZIP DO:0 FPO:73636055 SZ:283129/769367/2.72 VC:1269723 ENC:PLAIN_DICTIONARY,RLE
screen_position:          INT32 GZIP DO:0 FPO:73919184 SZ:157328/199749/1.27 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
formats:
.list:
..element:
...w:                     INT32 GZIP DO:0 FPO:74076512 SZ:486583/1223216/2.51 VC:1049607 ENC:PLAIN_DICTIONARY,RLE
...h:                     INT32 GZIP DO:0 FPO:74563095 SZ:530821/1379116/2.60 VC:1049607 ENC:PLAIN_DICTIONARY,RLE
clickbrowser:             INT32 GZIP DO:0 FPO:75093916 SZ:228/230/1.01 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
imp_metrics:
.list:
..element:
...vendor:                BINARY GZIP DO:0 FPO:75094144 SZ:194466/417036/2.14 VC:858171 ENC:PLAIN_DICTIONARY,RLE
...type:                  BINARY GZIP DO:0 FPO:75288610 SZ:234199/518370/2.21 VC:858171 ENC:PLAIN_DICTIONARY,RLE
...value:                 FLOAT GZIP DO:0 FPO:75522809 SZ:504343/875112/1.74 VC:858171 ENC:PLAIN_DICTIONARY,RLE
bidfloor:                 FLOAT GZIP DO:0 FPO:76027152 SZ:607943/1020674/1.68 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
bidfloorcur:              BINARY GZIP DO:0 FPO:76635095 SZ:150562/185413/1.23 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
deals_at:
.list:
..element:                INT32 GZIP DO:0 FPO:76785657 SZ:533870/877560/1.64 VC:1602083 ENC:PLAIN_DICTIONARY,RLE
deals_bidfloor:
.list:
..element:                FLOAT GZIP DO:0 FPO:77319527 SZ:989271/2245841/2.27 VC:1602083 ENC:PLAIN_DICTIONARY,RLE
deals_bidfloorcur:
.list:
..element:                BINARY GZIP DO:0 FPO:78308798 SZ:498588/784115/1.57 VC:1602083 ENC:PLAIN_DICTIONARY,RLE
deals_id:
.list:
..element:                BINARY GZIP DO:0 FPO:78807386 SZ:780430/1864602/2.39 VC:1602083 ENC:PLAIN_DICTIONARY,RLE
coppa:                    BINARY GZIP DO:0 FPO:79587816 SZ:64564/106135/1.64 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
google_segments:
.key_value:
..key:                    BINARY GZIP DO:0 FPO:79652380 SZ:55/41/0.75 VC:693063 ENC:PLAIN,RLE
..value:                  FLOAT GZIP DO:0 FPO:79652435 SZ:55/41/0.75 VC:693063 ENC:PLAIN,RLE
ias_tags:
.list:
..element:                BINARY GZIP DO:0 FPO:79652490 SZ:55/41/0.75 VC:693063 ENC:PLAIN,RLE
publisher_id:             BINARY GZIP DO:0 FPO:79652545 SZ:633653/789458/1.25 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
api:
.list:
..element:                INT32 GZIP DO:0 FPO:80286198 SZ:483655/795837/1.65 VC:1227402 ENC:PLAIN_DICTIONARY,RLE
sdk_version:              BINARY GZIP DO:0 FPO:80769853 SZ:358/434/1.21 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
protocols:
.list:
..element:                INT32 GZIP DO:0 FPO:80770211 SZ:177966/434824/2.44 VC:912678 ENC:PLAIN_DICTIONARY,RLE
at:                       INT32 GZIP DO:0 FPO:80948177 SZ:72334/107639/1.49 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
gdpr_flag:                INT32 GZIP DO:0 FPO:81020511 SZ:1334/1906/1.43 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
is_adot_vendor_allowed:   BOOLEAN GZIP DO:0 FPO:81021845 SZ:47500/147164/3.10 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
is_adot_purpose_allowed:  BOOLEAN GZIP DO:0 FPO:81069345 SZ:36256/147164/4.06 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
allowed_purposes:
.list:
..element:                INT32 GZIP DO:0 FPO:81105601 SZ:421753/3229734/7.66 VC:4927822 ENC:PLAIN_DICTIONARY,RLE
user_consent:             BOOLEAN GZIP DO:0 FPO:81527354 SZ:162/86673/535.02 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
bidding_user_consent:     BOOLEAN GZIP DO:0 FPO:81527516 SZ:28401/86674/3.05 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
is_european_country:      BOOLEAN GZIP DO:0 FPO:81555917 SZ:162/86673/535.02 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
encoded_consent_string:   BINARY GZIP DO:0 FPO:81556079 SZ:17813058/153901125/8.64 VC:693063 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
is_bid:                   BOOLEAN GZIP DO:0 FPO:99369137 SZ:28230/86674/3.07 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
is_multi_imp:             BOOLEAN GZIP DO:0 FPO:99397367 SZ:303/86673/286.05 VC:693063 ENC:PLAIN,BIT_PACKED,RLE
imp_index:                INT32 GZIP DO:0 FPO:99397670 SZ:409/417/1.02 VC:693063 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
imp_id:                   BINARY GZIP DO:0 FPO:99398079 SZ:8315080/16047380/1.93 VC:693063 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE

row group 2:             RC:63217 TS:32092043 OFFSET:107713159
--------------------------------------------------------------------------------
ts:                       INT64 GZIP DO:0 FPO:107713159 SZ:177625/505791/2.85 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
version:                  BINARY GZIP DO:0 FPO:107890784 SZ:100/62/0.62 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
request_id:               BINARY GZIP DO:0 FPO:107890884 SZ:1061504/2271446/2.14 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
cat:
.list:
..element:                BINARY GZIP DO:0 FPO:108952388 SZ:64620/211548/3.27 VC:179027 ENC:PLAIN_DICTIONARY,RLE
app_bundle:               BINARY GZIP DO:0 FPO:109017008 SZ:55924/80491/1.44 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
app_domain:               BINARY GZIP DO:0 FPO:109072932 SZ:31973/48433/1.51 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
app_id:                   BINARY GZIP DO:0 FPO:109104905 SZ:65440/87962/1.34 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
app_name:                 BINARY GZIP DO:0 FPO:109170345 SZ:64818/86490/1.33 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
site_domain:              BINARY GZIP DO:0 FPO:109235163 SZ:53/33/0.62 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
site_id:                  BINARY GZIP DO:0 FPO:109235216 SZ:53/33/0.62 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
site_name:                BINARY GZIP DO:0 FPO:109235269 SZ:53/33/0.62 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
site_page:                BINARY GZIP DO:0 FPO:109235322 SZ:53/33/0.62 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
site_cleaned_page:        BINARY GZIP DO:0 FPO:109235375 SZ:53/33/0.62 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
supply:                   BINARY GZIP DO:0 FPO:109235428 SZ:42714/67547/1.58 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_tags_ids:
.list:
..element:                INT32 GZIP DO:0 FPO:109278142 SZ:56/41/0.73 VC:63217 ENC:PLAIN,RLE
supply_bundle:            BINARY GZIP DO:0 FPO:109278198 SZ:55924/80491/1.44 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_domain:            BINARY GZIP DO:0 FPO:109334122 SZ:31973/48433/1.51 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_id:                BINARY GZIP DO:0 FPO:109366095 SZ:65440/87962/1.34 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_name:              BINARY GZIP DO:0 FPO:109431535 SZ:64818/86490/1.33 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
supply_page:              BINARY GZIP DO:0 FPO:109496353 SZ:129/110/0.85 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
supply_store_url:         BINARY GZIP DO:0 FPO:109496482 SZ:52563/89974/1.71 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
devicetype:               INT32 GZIP DO:0 FPO:109549045 SZ:11609/23173/2.00 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
connectiontype:           INT32 GZIP DO:0 FPO:109560654 SZ:23433/29965/1.28 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
carrier:                  BINARY GZIP DO:0 FPO:109584087 SZ:22613/33804/1.49 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
language:                 BINARY GZIP DO:0 FPO:109606700 SZ:15708/24092/1.53 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
make:                     BINARY GZIP DO:0 FPO:109622408 SZ:25838/66537/2.58 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
model:                    BINARY GZIP DO:0 FPO:109648246 SZ:83000/104381/1.26 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
osv:                      BINARY GZIP DO:0 FPO:109731246 SZ:26879/48581/1.81 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
hwv:                      BINARY GZIP DO:0 FPO:109758125 SZ:53/33/0.62 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
country:                  BINARY GZIP DO:0 FPO:109758178 SZ:100/62/0.62 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
city:                     BINARY GZIP DO:0 FPO:109758278 SZ:153660/249324/1.62 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
zip:                      BINARY GZIP DO:0 FPO:109911938 SZ:109732/150229/1.37 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
geotype:                  INT32 GZIP DO:0 FPO:110021670 SZ:14463/17300/1.20 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
lat:                      DOUBLE GZIP DO:0 FPO:110036133 SZ:174973/214385/1.23 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
lon:                      DOUBLE GZIP DO:0 FPO:110211106 SZ:197512/222564/1.13 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
sdk_lat:                  DOUBLE GZIP DO:0 FPO:110408618 SZ:3044/3745/1.23 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
sdk_lon:                  DOUBLE GZIP DO:0 FPO:110411662 SZ:3143/3745/1.19 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
inferred_lat:             DOUBLE GZIP DO:0 FPO:110414805 SZ:216817/271794/1.25 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
inferred_lon:             DOUBLE GZIP DO:0 FPO:110631622 SZ:235680/275050/1.17 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
ua:                       BINARY GZIP DO:0 FPO:110867302 SZ:312845/3225153/10.31 VC:63217 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
browser:                  BINARY GZIP DO:0 FPO:111180147 SZ:11388/12555/1.10 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
idfa:                     BINARY GZIP DO:0 FPO:111191535 SZ:1253591/2333627/1.86 VC:63217 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
ssp:                      BINARY GZIP DO:0 FPO:112445126 SZ:20871/31926/1.53 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
ssp_iso:                  BINARY GZIP DO:0 FPO:112465997 SZ:20815/31856/1.53 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
user_id:                  BINARY GZIP DO:0 FPO:112486812 SZ:833942/1451118/1.74 VC:63217 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
gender:                   BINARY GZIP DO:0 FPO:113320754 SZ:855/1525/1.78 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
yob:                      INT32 GZIP DO:0 FPO:113321609 SZ:1011/1635/1.62 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
buyeruid:                 BINARY GZIP DO:0 FPO:113322620 SZ:838256/1451769/1.73 VC:63217 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
ip:                       BINARY GZIP DO:0 FPO:114160876 SZ:340594/826799/2.43 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
ipv6:                     BINARY GZIP DO:0 FPO:114501470 SZ:1268/3031/2.39 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
displaytype:              INT32 GZIP DO:0 FPO:114502738 SZ:9086/15200/1.67 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
height:                   INT32 GZIP DO:0 FPO:114511824 SZ:28473/68160/2.39 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
width:                    INT32 GZIP DO:0 FPO:114540297 SZ:26237/60272/2.30 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
creative_formats:
.list:
..element:                INT32 GZIP DO:0 FPO:114566534 SZ:27275/70328/2.58 VC:115731 ENC:PLAIN_DICTIONARY,RLE
screen_position:          INT32 GZIP DO:0 FPO:114593809 SZ:15291/18177/1.19 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
formats:
.list:
..element:
...w:                     INT32 GZIP DO:0 FPO:114609100 SZ:43816/103708/2.37 VC:95597 ENC:PLAIN_DICTIONARY,RLE
...h:                     INT32 GZIP DO:0 FPO:114652916 SZ:55963/115230/2.06 VC:95597 ENC:PLAIN_DICTIONARY,RLE
clickbrowser:             INT32 GZIP DO:0 FPO:114708879 SZ:74/55/0.74 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
imp_metrics:
.list:
..element:
...vendor:                BINARY GZIP DO:0 FPO:114708953 SZ:18870/38196/2.02 VC:78178 ENC:PLAIN_DICTIONARY,RLE
...type:                  BINARY GZIP DO:0 FPO:114727823 SZ:22607/47452/2.10 VC:78178 ENC:PLAIN_DICTIONARY,RLE
...value:                 FLOAT GZIP DO:0 FPO:114750430 SZ:53740/79976/1.49 VC:78178 ENC:PLAIN_DICTIONARY,RLE
bidfloor:                 FLOAT GZIP DO:0 FPO:114804170 SZ:56134/81732/1.46 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
bidfloorcur:              BINARY GZIP DO:0 FPO:114860304 SZ:14526/16889/1.16 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
deals_at:
.list:
..element:                INT32 GZIP DO:0 FPO:114874830 SZ:49914/81395/1.63 VC:148244 ENC:PLAIN_DICTIONARY,RLE
deals_bidfloor:
.list:
..element:                FLOAT GZIP DO:0 FPO:114924744 SZ:93934/187111/1.99 VC:148244 ENC:PLAIN_DICTIONARY,RLE
deals_bidfloorcur:
.list:
..element:                BINARY GZIP DO:0 FPO:115018678 SZ:45872/72720/1.59 VC:148244 ENC:PLAIN_DICTIONARY,RLE
deals_id:
.list:
..element:                BINARY GZIP DO:0 FPO:115064550 SZ:73610/175431/2.38 VC:148244 ENC:PLAIN_DICTIONARY,RLE
coppa:                    BINARY GZIP DO:0 FPO:115138160 SZ:6140/9727/1.58 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
google_segments:
.key_value:
..key:                    BINARY GZIP DO:0 FPO:115144300 SZ:56/41/0.73 VC:63217 ENC:PLAIN,RLE
..value:                  FLOAT GZIP DO:0 FPO:115144356 SZ:56/41/0.73 VC:63217 ENC:PLAIN,RLE
ias_tags:
.list:
..element:                BINARY GZIP DO:0 FPO:115144412 SZ:56/41/0.73 VC:63217 ENC:PLAIN,RLE
publisher_id:             BINARY GZIP DO:0 FPO:115144468 SZ:60180/76792/1.28 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
api:
.list:
..element:                INT32 GZIP DO:0 FPO:115204648 SZ:45852/72224/1.58 VC:111142 ENC:PLAIN_DICTIONARY,RLE
sdk_version:              BINARY GZIP DO:0 FPO:115250500 SZ:132/98/0.74 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
protocols:
.list:
..element:                INT32 GZIP DO:0 FPO:115250632 SZ:17910/39854/2.23 VC:83460 ENC:PLAIN_DICTIONARY,RLE
at:                       INT32 GZIP DO:0 FPO:115268542 SZ:6851/9871/1.44 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
gdpr_flag:                INT32 GZIP DO:0 FPO:115275393 SZ:227/233/1.03 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
is_adot_vendor_allowed:   BOOLEAN GZIP DO:0 FPO:115275620 SZ:4832/13661/2.83 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
is_adot_purpose_allowed:  BOOLEAN GZIP DO:0 FPO:115280452 SZ:3866/13661/3.53 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
allowed_purposes:
.list:
..element:                INT32 GZIP DO:0 FPO:115284318 SZ:38581/295103/7.65 VC:449917 ENC:PLAIN_DICTIONARY,RLE
user_consent:             BOOLEAN GZIP DO:0 FPO:115322899 SZ:84/7941/94.54 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
bidding_user_consent:     BOOLEAN GZIP DO:0 FPO:115322983 SZ:2676/7942/2.97 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
is_european_country:      BOOLEAN GZIP DO:0 FPO:115325659 SZ:84/7941/94.54 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
encoded_consent_string:   BINARY GZIP DO:0 FPO:115325743 SZ:1565349/13957893/8.92 VC:63217 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE
is_bid:                   BOOLEAN GZIP DO:0 FPO:116891092 SZ:2659/7942/2.99 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
is_multi_imp:             BOOLEAN GZIP DO:0 FPO:116893751 SZ:89/7941/89.22 VC:63217 ENC:PLAIN,BIT_PACKED,RLE
imp_index:                INT32 GZIP DO:0 FPO:116893840 SZ:116/79/0.68 VC:63217 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
imp_id:                   BINARY GZIP DO:0 FPO:116893956 SZ:808660/1467791/1.82 VC:63217 ENC:PLAIN,PLAIN_DICTIONARY,BIT_PACKED,RLE```
martint commented 3 years ago

I think what's happening is that the ts column is an INT64 column with values encoded in milliseconds. Since the column has no logical type annotations, the code is falling through and reading it directly into native Presto representation, which expects values in microseconds:

https://github.com/prestosql/presto/blob/master/presto-parquet/src/main/java/io/prestosql/parquet/reader/PrimitiveColumnReader.java#L88-L98

A few of possible ways to fix this:

luhhujbb commented 3 years ago

Timestamps are written using spark as Long/BIGINT, type timestamp is configured in hive/presto during Table creation.

martint commented 3 years ago

I'm curious about how Hive interprets those values. Do you have a Hive installation that you could try with?

In a nutshell, I'm trying to determine whether this has worked in the past by chance or whether interpreting those values as milliseconds is the expected behavior from Hive's perspective.

luhhujbb commented 3 years ago

Our setup is really small since we mostly use only metastore, but the result is the following : Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

Someone has the same issue here : https://stackoverflow.com/questions/60492836/timestamp-not-behaving-as-intended-with-parquet-in-hive

And it seems a bug is opened on hive : https://issues.apache.org/jira/browse/HIVE-15079

martint commented 3 years ago

From the linked page (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps), it would appear that even the previous behavior of interpreting values as milliseconds was incorrect:

Supported conversions: Integer numeric types: Interpreted as UNIX timestamp in seconds

luhhujbb commented 3 years ago

I think spark follow the default java behavior of sql timestamp : https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/Timestamp.html, Long construct a millisecond timestamp.

luhhujbb commented 3 years ago

From the linked page (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps), it would appear that even the previous behavior of interpreting values as milliseconds was incorrect:

Supported conversions: Integer numeric types: Interpreted as UNIX timestamp in seconds

Exactly, that's what make me think, at first, that we write our timestamp in seconds, and that timestamp(0) should be supported to follow hive spec.

findepi commented 3 years ago

Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable

see also https://github.com/prestosql/presto/issues/5483#issuecomment-707627539

In a nutshell, I'm trying to determine whether this has worked in the past by chance or whether interpreting those values as milliseconds is the expected behavior from Hive's perspective.

I think in Parquet writer some mappings work by a chance only.

I once tried to clean it up, but did not finish (WIP https://github.com/prestosql/presto/pull/2840)

luhhujbb commented 3 years ago

So yes it seems it works by chance only.

martint commented 3 years ago

@luhhujbb, thanks for the research. It appears to me that the right course of action is to disallow that mapping. Unfortunately, it means you’d have to adjust your Spark processes to write the correct type with annotations.

luhhujbb commented 3 years ago

@martint I agree it will be cleaner that way. Many Thanks.

kelindar commented 3 years ago

We have also hit this issue when using Presto Thrift Connector. Our system (https://github.com/kelindar/talaria) is exporting timestamp as timestamp(3) and sending data back for timestamps in milliseconds as specified here,

/**
 * Elements of {@code nulls} array determine if a value for a corresponding row is null.
 * Elements of {@code timestamps} array are values for each row represented as the number
 * of milliseconds passed since 1970-01-01T00:00:00 UTC.
 * If row is null then value is ignored.
 */
struct PrestoThriftTimestamp {
  1: optional list<bool> nulls;
  2: optional list<i64> timestamps;
}

We narrowed it down to Presto and rolled back our Presto to version 335 and the issue seems to be gone.

MichaelZhao9824 commented 3 years ago

We got the same issue - we use PySpark 3 + hive 2.6 + parquet. We had two environments - version 333 and version 343. The timestamp column in version 333 is displayed as timestamp but in 343 it is displayed as timestamp(3).

Our Spark job use the default timestamp format (yyyy-MM-dd HH:mm:ss ) in Spark and the hive table uses the default timestamp (yyyy-MM-dd HH:mm:ss ) as well. I found any timezone conversion function not working in version 343 with the timestamp(3) column

findepi commented 3 years ago

We have also hit this issue when using Presto Thrift Connector. Our system (https://github.com/kelindar/talaria) is exporting timestamp as timestamp(3) and ...

@kelindar can you please file a separate issue for this?

findepi commented 3 years ago

Our Spark job use the default timestamp format (yyyy-MM-dd HH:mm:ss ) in Spark and the hive table uses the default timestamp (yyyy-MM-dd HH:mm:ss ) as well. I found any timezone conversion function not working in version 343 with the timestamp(3) column

@MichaelZhao9824 at first this seems loosely related to original problem reported here. can you please file a separate issue for this and provide more details?

luhhujbb commented 3 years ago

This issue comes from the following code path :

  1. The parquet column int64 is read as a longColumn in parquet file.
  2. The spi type derived from hive metastore is timestamp(3), so the long is then considered as shortTimestampType
  3. Release 340 and prior consider that precision <=3 are milliseconds encoded timestamp, and else microseconds encoded timestamp. (cf https://github.com/prestosql/presto/blob/340/presto-spi/src/main/java/io/prestosql/spi/type/ShortTimestampType.java#L132)

Now all connectors convert internally all timestamps in microseconds.

findepi commented 3 years ago

@luhhujbb thanks for the update.

Let's keep the issue open unless we decide there is no more work tbd.

There is this idea open for example:

disallow un-annotated INT64. This would break backward compatibility, but it's the "safe" choice that avoids correctness issues if data was written incorrectly.

tooptoop4 commented 3 years ago

@luhhujbb do u have a fix?

luhhujbb commented 3 years ago

@tooptoop4, we are converting our unanottated int64 parquet fields to annotated int64 milliseconds timestamp parquet fields in existing data and we rewrite our sparks job to correctly write timestamp in parquet. It's a bit painful but's we think it's the cleanest solution.

korkile commented 3 years ago

Presto to version 350 seems to work with it

chinayangze commented 2 years ago

Is there any solution?