Closed chenziliang closed 1 month ago
Internally if users don't specify timezone for datetime/datetime64 column, we use UTC timezone by default
Experimentally, I don't believe that's true (of course, the column stores utime internally always).
Demonstration: my system timezone is Europe/London and the current time is approx 17:51 BST (UTC+1). I define a column without timezone:
create stream foo ( id int32, t datetime )
select now()
ββββββββββββββββnow()ββ
β 2024-09-10 17:51:34 β
βββββββββββββββββββββββ
insert into foo (id, t) values ( 1, now() )
select * from table(foo)
ββidββ¬βββββββββββββββββββtββ¬ββββββββββββββββ_tp_timeββ
β 1 β 2024-09-10 17:51:53 β 2024-09-10 16:51:53.608 β
ββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββββββββββ
select t,t::int,to_type_name(t) from table(foo);
ββββββββββββββββββββtββ¬βcast(t, 'int')ββ¬βto_type_name(t)ββ
β 2024-09-10 17:51:53 β 1725987113 β datetime β
βββββββββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββ
And from command line:
% date -r 1725987113
Tue 10 Sep 2024 17:51:53 BST
This shows that:
datetime('Europe/London')
even though the type is datetime
.Conversely, the _tp_time
column is defined with a timezone of UTC, and therefore renders to UTC:
select _tp_time,to_type_name(_tp_time) from table(foo)
βββββββββββββββββ_tp_timeββ¬βto_type_name(_tp_time)ββ
β 2024-09-10 16:51:53.608 β datetime64(3, 'UTC') β
βββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ
This is all with macOS as the platform.
For now, when we select a timestamp, it doesn't tell you what timezone it is used. Internally if users don't specify timezone for datetime/datetime64 column, we use UTC timezone by default, but sometimes, users will specify timezone for a datetime column, without displaying timezone, it could be confusing.
To be more precise, is your suggestion that the "Pretty*" output formats rendering of a datetime(zone)
value should display the timezone? That seems like a good idea. To be backwards-compatible I don't think it could affect any other output format.
That would at least show clearly that a _tp_time
value is UTC.
What about a datetime
value without zone? I guess that should remain rendered without a zone (the other option is to show the implied system timezone, but that may use a lot of column space unnecessarily in the normal case)
More details https://github.com/timeplus-io/proton/discussions/833.
Also relevant: https://github.com/timeplus-io/docs/issues/218
Comparing the behaviour with Clickhouse 24.8.4.13 (on Linux this time):
create table foo ( t1 datetime, t2 datetime('UTC') ) primary key t1
select now()
ββββββββββββββββnow()ββ
1. β 2024-09-10 18:21:39 β
βββββββββββββββββββββββ
insert into foo (t1, t2) values (now(), now())
select t1,t2 from foo
βββββββββββββββββββt1ββ¬ββββββββββββββββββt2ββ
1. β 2024-09-10 18:21:46 β 2024-09-10 17:21:46 β
βββββββββββββββββββββββ΄ββββββββββββββββββββββ
select t1,toTypeName(t1),t2,toTypeName(t2) from foo
βββββββββββββββββββt1ββ¬βtoTypeName(t1)ββ¬ββββββββββββββββββt2ββ¬βtoTypeName(t2)βββ
1. β 2024-09-10 18:21:46 β DateTime β 2024-09-10 17:21:46 β DateTime('UTC') β
βββββββββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββ
Therefore the same issue occurs: if in the same query you're rendering a column with a timezone plus a column without timezone, then it's confusing because the timezone is hidden in the type.
In Clickhouse though, you declare all your own time columns explicitly, so they are generally consistent with each other. In Proton you get the auto-created _tp_time
.
There are two solutions:
1) Usually in the user experience, only the local time zone and the UTC time zone (such as __tptime) are present, so we can always display the UTC datetime with format %Y-%m-%d %H:%M:%SZ
via writeDateTimeTextISO()
, and other date time are the same as now.
for example: 2024-09-10 16:21:46Z
and 2024-09-10 17:21:46
2) datetime with time zone : %Y-%m-%d %H:%M:%S%Z
(e.g. 2024-09-10 16:21:46+00:00
and 2024-09-10 17:21:46+01:00
)
datetime without time zone: %Y-%m-%d %H:%M:%S
(e.g. 2024-09-10 17:21:46
same as now)
Agreed. Or combine them for option 3: omit timezone if type doesn't specify it; use the "Z" shortcut if it's UTC; and use the full offset otherwise.
2024-09-10 18:21:46
(same as now)2024-09-10 17:21:46Z
2024-09-10 18:21:46+01:00
This saves horizontal screen space for the common case of _tp_time
, but is also unambiguous for columns which specify some other timezone.
EDIT: Also raised as a discussion for ClickHouse: https://github.com/ClickHouse/ClickHouse/discussions/69923
Hi @candlerb , a PR is being reviewed. Feel free to share your comment to refine the behavior. Thanks.
Describe what enhancement you'd like to have
For now, when we select a timestamp, it doesn't tell you what timezone it is used. Internally if users don't specify timezone for datetime/datetime64 column, we use UTC timezone by default, but sometimes, users will specify timezone for a datetime column, without displaying timezone, it could be confusing.
More details https://github.com/timeplus-io/proton/discussions/833. Thanks @candlerb to bring up this issue.