Since we migrated to timescale, the way we treat time zones is incorrect and can result in corruption to the data.
Introduction to PostgreSQL's TIMESTAMP WITH TIME ZONE
There is no difference in how TIMESTAMP WITH TIME ZONE and TIMESTAMP WITHOUT TIME ZONE are stored. Both are 8 bytes, an int64 with the number of microseconds since 2000-01-01 00:00:00. The time zone is not stored anywhere.
The only difference is in what the timestamp represents. In TIMESTAMP WITHOUT TIME ZONE, the stored timestamp represents a date and time in an unknown time zone. In TIMESTAMP WITH TIME ZONE, the stored timestamp represents a date and time in UTC. When retrieving a TIMESTAMP WITH TIME ZONE (e.g. with psql), we can use AT TIME ZONE to convert it to some other time zone, otherwise it is converted to a default time zone, such as that specified by the TZ environment variable. Likewise, when saving a TIMESTAMP WITH TIME ZONE, it is converted to UTC; the time zone from which it is converted, if not specified in the input (e.g. in a literal), is, again, some default time zone.
When a TIMESTAMP WITH TIME ZONE is retrieved into a Django DateTimeField, it is not clear which time zone it's using (maybe always UTC, or it may be affected by PostgreSQL settings and/or Django settings), but it doesn't matter anyway because, for the Enhydris use case, it has to be converted to another time zone anyway, which can be done with the .astimezone() method.
What Enhydris does
TimeseriesGroup has a time_zone attribute. When a time series is being saved from a htimeseries object (whose timestamps are naive), the timestamps are considered to be in in the timeseries group's time zone. Conversely, when a time series is being retrieved into a htimeseries object, the timestamps are converted to the timeseries group's time zone.
While this behaviour looks reasonable, it is not what was intended. TimeseriesGroup.time_zone was intended to be informational. The intention was that naive timestamps would be stored naively, and that if you stored 2022-10-10 19:30, then if you retrieved that it would be 2022-10-10 19:30. But this is not how it works. If the group's time zone is EET and you store that timestamp and then you change the group's time zone to CET and you retrieve the timestamp again, you'll get 18:30 instead. If you have an automatically updated time series with time zone MSK and one day you see it and say "hey, that's wrong!" and you change it to CET, it will continue to update the data but now the dates will mean something different.
How to fix these problems
Fixing these problems will also simplify the system much:
The TimeseriesGroup.time_zone attribute must go away, and likewise the TimeZone model.
A station.display_time_zone attribute must be added, which will only affect how the timestamps are displayed (e.g. on the chart or whenever the time series is downloaded in CSV). This must be in a format like the Olson database, and initially only the Etc/GMT* zones will be supported. There is no need to have a database table for that; we have Python's zoneinfo.
When storing a time series, the time zone has to be provided. Either a compulsory parameter must be specified to the API, or the uploaded time series must be in file format with the TimeZone parameter specified.
Since we migrated to timescale, the way we treat time zones is incorrect and can result in corruption to the data.
Introduction to PostgreSQL's TIMESTAMP WITH TIME ZONE
There is no difference in how
TIMESTAMP WITH TIME ZONE
andTIMESTAMP WITHOUT TIME ZONE
are stored. Both are 8 bytes, anint64
with the number of microseconds since 2000-01-01 00:00:00. The time zone is not stored anywhere.The only difference is in what the timestamp represents. In
TIMESTAMP WITHOUT TIME ZONE
, the stored timestamp represents a date and time in an unknown time zone. InTIMESTAMP WITH TIME ZONE
, the stored timestamp represents a date and time in UTC. When retrieving aTIMESTAMP WITH TIME ZONE
(e.g. withpsql
), we can useAT TIME ZONE
to convert it to some other time zone, otherwise it is converted to a default time zone, such as that specified by theTZ
environment variable. Likewise, when saving aTIMESTAMP WITH TIME ZONE
, it is converted to UTC; the time zone from which it is converted, if not specified in the input (e.g. in a literal), is, again, some default time zone.When a
TIMESTAMP WITH TIME ZONE
is retrieved into a DjangoDateTimeField
, it is not clear which time zone it's using (maybe always UTC, or it may be affected by PostgreSQL settings and/or Django settings), but it doesn't matter anyway because, for the Enhydris use case, it has to be converted to another time zone anyway, which can be done with the.astimezone()
method.What Enhydris does
TimeseriesGroup
has atime_zone
attribute. When a time series is being saved from ahtimeseries
object (whose timestamps are naive), the timestamps are considered to be in in the timeseries group's time zone. Conversely, when a time series is being retrieved into ahtimeseries
object, the timestamps are converted to the timeseries group's time zone.While this behaviour looks reasonable, it is not what was intended.
TimeseriesGroup.time_zone
was intended to be informational. The intention was that naive timestamps would be stored naively, and that if you stored 2022-10-10 19:30, then if you retrieved that it would be 2022-10-10 19:30. But this is not how it works. If the group's time zone is EET and you store that timestamp and then you change the group's time zone to CET and you retrieve the timestamp again, you'll get 18:30 instead. If you have an automatically updated time series with time zone MSK and one day you see it and say "hey, that's wrong!" and you change it to CET, it will continue to update the data but now the dates will mean something different.How to fix these problems
Fixing these problems will also simplify the system much:
TimeseriesGroup.time_zone
attribute must go away, and likewise theTimeZone
model.station.display_time_zone
attribute must be added, which will only affect how the timestamps are displayed (e.g. on the chart or whenever the time series is downloaded in CSV). This must be in a format like the Olson database, and initially only theEtc/GMT*
zones will be supported. There is no need to have a database table for that; we have Python'szoneinfo
.