timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.8k stars 852 forks source link

[Bug]: time_bucket function doesn't account for time zone when applying offset #7059

Open eyadmba opened 1 week ago

eyadmba commented 1 week ago

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Other

What happened?

Description

I'm using the time_bucket function with both timezone and offset arguments, and I expected the offset to be applied after the time zone so that my 15 minute offset jumps ahead an hour and 15 minutes when DST occurs.

To clarify, in the screenshot below the "local_time" is in the Asia/Amman time zone, and when adding a 15 minute interval to "2021-03-25 23:45", the result should be "2021-03-26 01:00" instead of "2021-03-26 00:00" (an hour was skipped because of DST) image

The time_bucket function doesn't take that into consideration because when running the below query I'd expect it to sum up the first 4 rows (all of March 25th, and the first row of March 26th because it's been offset 15 minutes).

SELECT
    time_bucket('1 day'::interval, time,
        "timezone" := 'Asia/Amman',
        "offset" := '15 minutes'::interval
    ) as day_bucket,
    metric_id,
    sum(value) as sum,
    string_agg(value::text, ', ') as grouped_values
FROM metric_data
GROUP BY day_bucket, metric_id;

But instead the actual output is this: image

Workaround

A workaround for this issue would be to actually not use the offset parameter on the time_bucket function and instead apply the offset directly on your timestamp column since that column is (hopefully) in UTC:

SELECT
    time_bucket(
        '1 day'::interval,
        time - '15 minutes'::interval,
        "timezone" := 'Asia/Amman',
    ) as day_bucket,
    metric_id,
    sum(value) as sum,
    string_agg(value::text, ', ') as grouped_values
FROM metric_data
GROUP BY day_bucket, metric_id;

output: image

TimescaleDB version affected

2.5.1

PostgreSQL version used

14.2

What operating system did you use?

Windows 10 x64

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

Step 1. Setup

CREATE TABLE metric_data (
    time timestamp with time zone,
    metric_id varchar(20),
    value numeric
);

INSERT INTO metric_data(time, metric_id, value)
VALUES ('2021-03-25 21:15:00+00:00', 'm1', 2),
       ('2021-03-25 21:30:00+00:00', 'm1', 6),
       ('2021-03-25 21:45:00+00:00', 'm1', 15),
       ('2021-03-25 22:00:00+00:00', 'm1', 5),
       ('2021-03-25 22:15:00+00:00', 'm1', 1),
       ('2021-03-25 22:30:00+00:00', 'm1', 9)
;

Step 2. Run this query and notice how its output is not accurate

SELECT
    time_bucket(
            "bucket_width" := '1 day'::interval,
            "ts" := time,
            "timezone" := 'Asia/Amman',
            "offset" := '15 minutes'::interval
    ) as day_bucket,
    metric_id,
    sum(value) as sum
FROM metric_data
GROUP BY day_bucket, metric_id;