timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.97k stars 858 forks source link

Error creating continuous aggregate view with time_bucket origin/offset parameter #2265

Closed grafolean closed 3 months ago

grafolean commented 3 years ago

Relevant system information:

Describe the bug I am trying to create a continuous aggregate view using time_bucket with origin parameter set, but I am getting error. It works without origin or offset.

To Reproduce Steps to reproduce the behavior:

  1. Try to create a continuous aggregate view on a table using origin parameter of time_bucket:

    # CREATE VIEW test_hourly WITH (timescaledb.continuous) AS SELECT TIME_BUCKET('1 hours'::interval, ts::timestamp, '1970-01-01'::timestamp) AS period, AVG(value), MIN(value), MAX(value) FROM measurements GROUP BY period;
  2. See error:

    ERROR:  no valid bucketing function found for continuous aggregate query
  3. Observe that it works when origin is not specified:

    # CREATE VIEW test_hourly WITH (timescaledb.continuous) AS SELECT TIME_BUCKET('1 hours'::interval, ts::timestamp) AS period, AVG(value), MIN(value), MAX(value) FROM measurements GROUP BY period;
    CREATE VIEW

Expected behavior Parameter origin should be allowed (and obeyed).

Actual behavior Creating a continuous aggregate view fails.

Screenshots /

Additional context

fredericgermain commented 3 years ago

I upvote this. It'd be nice to have aggregation by day not aligned at midnight UTC.

alefebvre-berner commented 3 years ago

I also upvote this. It would be very helpfull to me. Also please consider having the offset being pass as a column of the table as the timestamp is, not only a fixed value.

DavidPavlicek commented 3 years ago

This is a major feature... definitely needs to be implemented... +1

erimatnor commented 3 years ago

Related to https://github.com/timescale/timescaledb/issues/414

mabhijithn commented 2 years ago

I am not sure this is related to #414. Even if I use time_bucket_ng, I cannot create materialized view for a continuous aggregate if I have to use origin parameter. Please do point what I am doing wrong. time_bucket also does not work. Though, outside MATERIALIZED VIEW, I am able to use time_bucket individually with origin and offsets.

CREATE MATERIALIZED VIEW onehour_candle 
WITH (timescaledb.continuous) AS 
SELECT symbol, 
timescaledb_experimental.time_bucket_ng('1 day', time, origin => TIMESTAMP '2022-01-03') AS date,
FIRST(open, time) as open,
MAX(high) as high,
MIN(low) as low,
LAST(close, time) as close
FROM stockdata
GROUP BY symbol,date;
ERROR:  continuous aggregate view must include a valid time bucket function

Timescaledb version: 2.7.0 Self-hosted on Ubuntu 20.04. Postgresql version info:

PostgreSQL 12.11 (Ubuntu 12.11-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

afiskon commented 2 years ago

Please do point what I am doing wrong

The problem is that named arguments are currently not supported when creating a CAGG. Just don't specify the name of the argument.

mabhijithn commented 2 years ago

I apologize. I should have been more clear. I am unable to create a MATERIALIZED VIEW continuous aggregate using just time_bucket and having an origin parameter. For example, I get error for the command as shown below:

CREATE MATERIALIZED VIEW onehour_candle 
WITH (timescaledb.continuous) AS 
SELECT symbol, 
time_bucket('1 hour', time, TIMESTAMP '2022-01-03') AS date,
FIRST(open, time) as open,
MAX(high) as high,
MIN(low) as low,
LAST(close, time) as close
FROM stockdata
GROUP BY symbol,date;
ERROR:  continuous aggregate view must include a valid time bucket function

@afiskon Is it possible to have a continuous aggregate with ORIGIN parameter? If so how will I provide the origin timestamp?

konskov commented 2 years ago

The following should work for you:

CREATE MATERIALIZED VIEW onehour_candle 
WITH (timescaledb.continuous) AS 
SELECT symbol, 
timescaledb_experimental.time_bucket_ng('1 day', time, TIMESTAMP '2022-01-03') AS date,
FIRST(open, time) as open,
MAX(high) as high,
MIN(low) as low,
LAST(close, time) as close
FROM stockdata
GROUP BY symbol,date;
mabhijithn commented 2 years ago

Unfortunately, no. I get the same error as given below. I also tried TIMESTAMPTZ instead of TIMESTAMP (since my time column contains timestamp)

CREATE MATERIALIZED VIEW onehour_candle_new 
WITH (timescaledb.continuous) AS 
SELECT symbol, 
timescaledb_experimental.time_bucket_ng('1 day', time, TIMESTAMP '2022-01-03') AS date,
FIRST(open, time) as open,
MAX(high) as high,
MIN(low) as low,
LAST(close, time) as close
FROM stockdata
GROUP BY symbol,date;
ERROR:  continuous aggregate view must include a valid time bucket function

Addition: The SELECT itself will work if not under a MATERIALIZED VIEW as shown below:

nseonemintest=# SELECT symbol, 
timescaledb_experimental.time_bucket_ng('1 day', time, TIMESTAMP '2022-01-03') AS date,
FIRST(open, time) as open,                                                             
MAX(high) as high,                                                             
MIN(low) as low,
LAST(close, time) as close
FROM stockdata
GROUP BY symbol,date ORDER BY date desc LIMIT 10;
    symbol     |          date          |  open   |  high   |   low   |  close  
---------------+------------------------+---------+---------+---------+---------
 ACC           | 2022-04-27 00:00:00+02 |  2306.2 | 2361.75 | 2263.05 |  2349.9
 ACC_F1        | 2022-04-27 00:00:00+02 |  2266.6 |  2358.5 |  2264.1 |  2348.3
 ADANIENT      | 2022-04-27 00:00:00+02 | 2404.95 | 2420.95 |    2325 |  2334.1
 ADANIENT_F1   | 2022-04-27 00:00:00+02 |  2395.1 |  2417.1 | 2325.65 | 2332.45
 ADANIPORTS    | 2022-04-27 00:00:00+02 |  914.05 |  924.65 |     880 |   887.3
konskov commented 2 years ago

If your time field is timestamp, the above should work. If your time field is timestamptz, then you must specify the timezone too as an argument, like this: timescaledb_experimental.time_bucket_ng('1 day', time, '2022-01-03', 'Europe/Moscow') AS date as is also noted in https://github.com/timescale/timescaledb/issues/414#issuecomment-1063049999 and the related documentation

mabhijithn commented 2 years ago

@konskov Excellent! Thank you very much! So the crux is:

  1. If timestamp in column and you want ORIGIN in your CAGG, one should use time_bucket_ng. This is documented yes.
  2. Casting a string with timestamptz is not required. You can specify time and the timezone as a string. For me, the confusion came when examples with SELECT command specified the casting of ORIGIN parameter using TIMESTAMPTZ.
FlorianWendelborn commented 1 year ago

ERROR: continuous aggregate view must include a valid time bucket function
— @mabhijithn

For me, this was fixed by using GROUP BY 1, 2 instead of GROUP BY "time", "shortId"

Search tags: fix, workaround, resolved, solution

jkittner commented 1 year ago

For my use case this is also an issue. With the following table:

CREATE TABLE temperatures (
    time TIMESTAMP WITH TIME ZONE PRIMARY KEY,
    temperature NUMERIC
);
SELECT create_hypertable('temperatures', 'time');

And some data which is right labelled meaning at the end of the interval! -> 15:00 is the interval from 14:50 to 15:00!

INSERT INTO temperatures(time, temperature) VALUES
('2022-01-01 14:50', 10.0),
('2022-01-01 15:00', 5.0),
('2022-01-01 15:10', 10.0),
('2022-01-01 15:20', 20.0),
('2022-01-01 16:00', 16.0);

When using this in a regular query I can get there with specifying an offset to get the averaging correct and then subtracting it again, to go back to the full hour. Exactly as described in the docs.

SELECT
    time_bucket('1 hour', time, '+1 second'::INTERVAL) + '-1 second'::INTERVAL AS hour,
    avg(temperature) AS temp_mean
FROM temperatures
GROUP BY hour;
          hour          |      temp_mean      
------------------------+---------------------
 2022-01-01 15:00:01+00 | 15.3333333333333333
 2022-01-01 14:00:01+00 |  7.5000000000000000
(2 rows)

However, this cannot be transferred to continuous aggregates. When trying:

CREATE MATERIALIZED VIEW hourly_average
WITH (timescaledb.continuous) AS 
    SELECT
        time_bucket('1 hour', time, '+1 second'::INTERVAL) + '-1 second'::INTERVAL AS hour,
        avg(temperature) AS temp_mean
    FROM temperatures
    GROUP BY hour;

This results in ERROR: continuous aggregate view must include a valid time bucket function.

Things I tried:

  1. removing the + '-1 second'::INTERVAL part -> same error!
  2. also removing the offset parameter -> works, but not the desired result!
  3. adding back the + '-1 second'::INTERVAL part -> back to the same error!
  4. trying this to get at least the side of the label right, knowing that the averaging is incorrect. However this does not work if time is of type TIMESTAMPTZ, only for TIMESTAMP - why is this? I have no idea...
    CREATE MATERIALIZED VIEW hourly_average
    WITH (timescaledb.continuous) AS 
        SELECT
            time_bucket('1 hour', time)  + '1 hour'::INTERVAL AS hour,
            avg(temperature) AS temp_mean
        FROM temperatures
        GROUP BY time_bucket('1 hour', time);

    NOTE: The above only works when specifying GROUP BY time_bucket('1 hour', time). You cannot use GROUP BY hour, since, yet again, we get the ERROR: continuous aggregate view must include a valid time bucket function...

Any ideas for a workaround or maybe when this feature, that works perfectly fine for time_bucket in regular queries, will be implemented for MATERIALIZED VIEW?

jkittner commented 1 year ago

The error message is also not quite right in the context. When looking at the full function signature of time_bucket

Schema              | public
Name                | time_bucket
Result data type    | timestamp with time zone
Argument data types | bucket_width interval, ts timestamp with time zone, timezone text, origin timestamp with time zone DEFAULT NULL::timestamp with time zone, "offset" interval DEFAULT NULL::interval
Type                | func

calling this of course succeeds, hence this is a "valid time bucket function".

SELECT time_bucket('1 hour'::INTERVAL, '2022-01-10 15:30'::TIMESTAMPTZ, 'UTC', NULL, NULL);

But in the context of a materialized view this function is not valid anymore? How come?

fcovatti commented 1 year ago

@jkittner when I tried similar workaround I get the following error make sure all functions in the continuous aggregate definition have IMMUTABLE volatility. Note that functions or expressions may be IMMUTABLE for one data type, but STABLE or VOLATILE for another

The support to offset functionality to me would be very interesting to have on the continuous aggregates.

jkittner commented 1 year ago

@fcovatti, you probably have TIMESTAMPTZ as a column data type. This doesn't work. For some reason only TIMESTAMP works (that may be on accident?).

I'm still hoping that this will be implemented some day. That would make working with inconsistent labeling across different time zones so much easier...

searchingforcode commented 1 year ago

@fcovatti, you probably have TIMESTAMPTZ as a column data type. This doesn't work. For some reason only TIMESTAMP works (that may be on accident?).

I'm still hoping that this will be implemented some day. That would make working with inconsistent labeling across different time zones so much easier... When we will get a update?

u3Izx9ql7vW4 commented 1 year ago

What's the status on this ticket? I keep seeing members closing other tickets saying it's a duplicate of this one, and it's been 3 years since it's been opened. If I need to aggregate based on the end of the timestamp, it's kind of a critical feature and it's kind of useless without materialized view support

Petwag commented 6 months ago

Any advancement on that ticket?

vonnue-yehya commented 5 months ago

Any updates on this ? I am trying to pass in the offset parameter to the time_bucket function when used as a cagg, looks like this is not allowed right now.