timeplus-io / proton

A stream processing engine and database, and a fast and lightweight alternative to ksqlDB and Apache Flink, πŸš€ powered by ClickHouse
https://timeplus.com
Apache License 2.0
1.58k stars 69 forks source link

Display timezone information for datetime #835

Closed chenziliang closed 1 month ago

chenziliang commented 2 months ago

Describe what enhancement you'd like to have

For now, when we select a timestamp, it doesn't tell you what timezone it is used. Internally if users don't specify timezone for datetime/datetime64 column, we use UTC timezone by default, but sometimes, users will specify timezone for a datetime column, without displaying timezone, it could be confusing.

SELECT _tp_time FROM stream;

More details https://github.com/timeplus-io/proton/discussions/833. Thanks @candlerb to bring up this issue.

candlerb commented 2 months ago

Internally if users don't specify timezone for datetime/datetime64 column, we use UTC timezone by default

Experimentally, I don't believe that's true (of course, the column stores utime internally always).

Demonstration: my system timezone is Europe/London and the current time is approx 17:51 BST (UTC+1). I define a column without timezone:

create stream foo ( id int32, t datetime )

select now()
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€now()─┐
β”‚ 2024-09-10 17:51:34 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

insert into foo (id, t) values ( 1, now() )

select * from table(foo)
β”Œβ”€id─┬───────────────────t─┬────────────────_tp_time─┐
β”‚  1 β”‚ 2024-09-10 17:51:53 β”‚ 2024-09-10 16:51:53.608 β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

select t,t::int,to_type_name(t) from table(foo);
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€t─┬─cast(t, 'int')─┬─to_type_name(t)─┐
β”‚ 2024-09-10 17:51:53 β”‚     1725987113 β”‚ datetime        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

And from command line:

% date -r 1725987113
Tue 10 Sep 2024 17:51:53 BST

This shows that:

  1. The UTC time is stored in the database
  2. The datetime value that's retrieved is converted to the local system time. AFAICS, this is the default if the column type doesn't have a timezone defined. That is, the column behaves as if it were datetime('Europe/London') even though the type is datetime.

Conversely, the _tp_time column is defined with a timezone of UTC, and therefore renders to UTC:

select _tp_time,to_type_name(_tp_time) from table(foo)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€_tp_time─┬─to_type_name(_tp_time)─┐
β”‚ 2024-09-10 16:51:53.608 β”‚ datetime64(3, 'UTC')   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This is all with macOS as the platform.

candlerb commented 2 months ago

For now, when we select a timestamp, it doesn't tell you what timezone it is used. Internally if users don't specify timezone for datetime/datetime64 column, we use UTC timezone by default, but sometimes, users will specify timezone for a datetime column, without displaying timezone, it could be confusing.

To be more precise, is your suggestion that the "Pretty*" output formats rendering of a datetime(zone) value should display the timezone? That seems like a good idea. To be backwards-compatible I don't think it could affect any other output format.

That would at least show clearly that a _tp_time value is UTC.

What about a datetime value without zone? I guess that should remain rendered without a zone (the other option is to show the implied system timezone, but that may use a lot of column space unnecessarily in the normal case)

More details https://github.com/timeplus-io/proton/discussions/833.

Also relevant: https://github.com/timeplus-io/docs/issues/218

candlerb commented 2 months ago

Comparing the behaviour with Clickhouse 24.8.4.13 (on Linux this time):

create table foo ( t1 datetime, t2 datetime('UTC') ) primary key t1

select now()
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€now()─┐
1. β”‚ 2024-09-10 18:21:39 β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

insert into foo (t1, t2) values (now(), now())

select t1,t2 from foo
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€t1─┬──────────────────t2─┐
1. β”‚ 2024-09-10 18:21:46 β”‚ 2024-09-10 17:21:46 β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

select t1,toTypeName(t1),t2,toTypeName(t2) from foo
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€t1─┬─toTypeName(t1)─┬──────────────────t2─┬─toTypeName(t2)──┐
1. β”‚ 2024-09-10 18:21:46 β”‚ DateTime       β”‚ 2024-09-10 17:21:46 β”‚ DateTime('UTC') β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Therefore the same issue occurs: if in the same query you're rendering a column with a timezone plus a column without timezone, then it's confusing because the timezone is hidden in the type.

In Clickhouse though, you declare all your own time columns explicitly, so they are generally consistent with each other. In Proton you get the auto-created _tp_time.

yl-lisen commented 2 months ago

There are two solutions: 1) Usually in the user experience, only the local time zone and the UTC time zone (such as __tptime) are present, so we can always display the UTC datetime with format %Y-%m-%d %H:%M:%SZ via writeDateTimeTextISO(), and other date time are the same as now. for example: 2024-09-10 16:21:46Z and 2024-09-10 17:21:46

2) datetime with time zone : %Y-%m-%d %H:%M:%S%Z (e.g. 2024-09-10 16:21:46+00:00 and 2024-09-10 17:21:46+01:00) datetime without time zone: %Y-%m-%d %H:%M:%S (e.g. 2024-09-10 17:21:46 same as now)

candlerb commented 2 months ago

Agreed. Or combine them for option 3: omit timezone if type doesn't specify it; use the "Z" shortcut if it's UTC; and use the full offset otherwise.

This saves horizontal screen space for the common case of _tp_time, but is also unambiguous for columns which specify some other timezone.

EDIT: Also raised as a discussion for ClickHouse: https://github.com/ClickHouse/ClickHouse/discussions/69923

jovezhong commented 1 month ago

Hi @candlerb , a PR is being reviewed. Feel free to share your comment to refine the behavior. Thanks.