prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.81k stars 5.31k forks source link

Documentation requests: address setting timezone back to UTC; document additional valid timezone values; address directionality of fixed timezone offsets (POSIX vs ISO) #18622

Open MPagel opened 1 year ago

MPagel commented 1 year ago

There does not appear to be a UTC time zone available for timestamp conversion for use with at_timezone.

UTC, GMT, +0, etc/GMT -00:00 do not seem to exist in the lists at https://github.com/prestodb/presto/blob/master/presto-common/src/main/resources/com/facebook/presto/common/type/zone-index.properties or https://docs.aws.amazon.com/athena/latest/ug/athena-supported-time-zones.html

The first link does suggest many fixed offsets would work (e.g. -05:00), but does not document what is meant by positive or negative directionality. Is it following the POSIX definition (US and the western hemisphere have positive values), or the now-standard ISO-8601 (US and western hemisphere have negative values)?

related: https://github.com/prestodb/presto/issues/14929

prestored commented 1 year ago

Could you please share few queries that you have tried?

MPagel commented 1 year ago

It looks like I had not tested as thoroughly as I thought I had

values of

+00:00, -00:00, +0, -0, 
UTC, UTC+0,
Etc/UTC, Etc/UTC+0
GMT, GMT+0, GMT0
Etc/GMT, Etc/GMT+0, Etc/GMT0

are all valid

but UTC0 and Etc/UTC0 are invalid, unlike their GMT counterparts

None of the timezone values above are present in the Presto source-based documentation linked in the OP.

In the code below, I have commented out lines that throw errors. I am also testing the edge-case of the doubled-hour when daylight savings time ends in the US.

with timepoint as (
    Select 
        with_timezone(timestamp '2022-11-06 00:11:50.187','America/New_York') t0, --EDT
        with_timezone(timestamp '2022-11-06 01:11:50.187','America/New_York') t1, --ambiguous, could be 4 hours behind UTC (EDT) or 5 hours behind (EST)
        with_timezone(timestamp '2022-11-06 02:11:50.187','America/New_York') t2, --should be EST
        with_timezone(timestamp '2022-11-06 03:11:50.187','America/New_York') t --EST
) 
select t.t as EST -- valid, eastern standard time
    ,at_timezone(t.t, 'America/Los_Angeles') as PDT --valid, pacific daylight time
 -- ,at_timezone(t.t, '+14:28') as "bogus test1" --invalid
    ,at_timezone(t.t, '+13:58') as "weird offset of time" --valid, documented
    ,at_timezone(t.t, '-00:00') as "(UTC)-00:00" --undocumented, but valid
    ,at_timezone(t.t, '+00:00') as "(UTC)+00:00" --undocumented, but valid
    ,at_timezone(t.t, '+0') as "(UTC)+0" --valid, not documented
    ,at_timezone(t.t, '-0') as "(UTC)-0" --valid, not documented
    ,at_timezone(t.t, 'UTC') as "UTC" --valid, not documented
    ,at_timezone(t.t, 'UTC+0') as "UTC+0" --valid, not documented
 -- ,at_timezone(t.t, 'UTC0') as "UTC0" --invalid
    ,at_timezone(t.t, 'Etc/UTC') as "Etc/UTC" --valid, not documented
 -- ,at_timezone(t.t, 'Etc/UTC0') as "Etc/UTC0" --invalid
    ,at_timezone(t.t, 'GMT') as "GMT" --valid, not documented
    ,at_timezone(t.t, 'GMT+0') as "GMT+0" --valid, not documented
    ,at_timezone(t.t, 'GMT0') as "GMT0" --valid, not documented
    ,at_timezone(t.t, 'Etc/GMT') as "ETC/GMT" --valid, not documented
    ,at_timezone(t.t, 'Etc/GMT+0') as "ETC/GMT+0" --valid, not documented
    ,at_timezone(t.t, 'Etc/GMT0') as "ETC/GMT0" --valid, not documented
    ,at_timezone(t.t0, 'UTC') as "just after midnight, NY (DST active)"
    ,at_timezone(t.t1, 'UTC') as "UTC for 1 AM in new york (daylight savings time ambiguous)"
    ,at_timezone(t.t2, 'UTC') as "UTC for 2 AM EST (non-DST)"
    ,at_timezone(t.t, 'UTC') as "UTC for 3 AM EST (non-DST)"
    ,at_timezone(t.t0, 'America/New_York') as "just after midnight, EDT"
    ,at_timezone(t.t1, 'America/New_York') as "1 AM ET (ambiguous)"
    ,at_timezone(t.t2, 'America/New_York') as "2 AM EST"
    ,at_timezone(t.t, 'America/New_York') as "3 AM EST"
    ,at_timezone(t.t0, 'America/Los_Angeles') as "LA time (PDT) for just after midnight EDT"
    ,at_timezone(t.t1, 'America/Los_Angeles') as "LA time (PDT) for 1 AM ET (ambiguous)"
    ,at_timezone(t.t2, 'America/Los_Angeles') as "LA time (PDT) for 2 AM EST"
    ,at_timezone(t.t, 'America/Los_Angeles') as "LA time (PDT) for 3 AM EST"
from timepoint as t

Here "documented" means specifically existing in the presto source. It does not necessarily actually mean that it is present in the Readme.md or any other official help resource file. Various (but not all) "undocumented" values may be documented and/or present in source of either joda or java.util.DateTimeZone.

MPagel commented 1 year ago

Joda does list the following zones at https://www.joda.org/joda-time/timezones.html

Etc/GMT, Etc/GMT+0, Etc/GMT-0, Etc/GMT0, Etc/Greenwich, GMT, GMT+0, GMT-0, GMT0, Greenwich Etc/UCT, UCT Etc/UTC, Etc/Universal, Etc/Zulu, Universal, Zulu

There is no official timezone list associated with any official java.util.DateTimeZone help posted online, but from https://garygregory.wordpress.com/2013/06/18/what-are-the-java-timezone-ids/ I see a similar list as for joda (minus the non-etc GMT zones).

Some zones -- such as UTC+0 -- work, but do not appear to be documented in Presto, Joda or the java.util.DateTimeZone (garygregory) list.