timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.83k stars 852 forks source link

[Enhancement]: Add WITH option to set chunk size in continuous aggregate #6923

Open RobAtticus opened 1 month ago

RobAtticus commented 1 month ago

What type of enhancement is this?

API improvement, User experience

What subsystems and features will be improved?

Continuous aggregate

What does the enhancement do?

When creating a continuous aggregate, I frequently have to change the chunk interval for a continuous aggregate because the default of 10x the underlying hypertable is too small. I'll end up with a continuous aggregate that has very small chunks because the data density increase by more than 10.

The current best way to avoid this is to create a continuous aggregate with WITH NO DATA so no chunks are made, and then run something like this:

SELECT set_chunk_time_interval(
  (SELECT
    format('%I.%I', materialization_hypertable_schema, materialization_hypertable_name)
  FROM timescaledb_information.continuous_aggregates
  WHERE
    view_schema = 'public'
    AND view_name = 'my_wonderful_agg'
), INTERVAL '24h');

Instead, I'd propose something like this:

CREATE MATERIALIZED VIEW public.my_wonderful_agg WITH (
    timescaledb.continuous,
    timescaledb.materialized_only = true,
    timescaledb.chunk_time_interval = '24h'
) AS
...

This would allow me to 1) avoid using WITH NO DATA (though I still might for other reasons) and 2) not have to include that not-very-obvious SQL statement to change the chunk size.

Implementation challenges

No response

mkindahl commented 1 month ago

This seems to be a duplicate of #1775

RobAtticus commented 1 month ago

Seems so, I'm fine with closing mine in that case, but let this serve as my emphatic +1 :)

fabriziomello commented 1 month ago

And maybe we should make set_chunk_time_interval to accept a cagg as parameter as well

RobAtticus commented 1 month ago

That would be helpful too, since a lot of other functions do allow caggs/HTs interchangeably

mkindahl commented 1 month ago

Seems so, I'm fine with closing mine in that case, but let this serve as my emphatic +1 :)

This provides a good example and API description compared to #1775 so I suggest to not close it. Just want to make sure that we close both once we fix it.

TomoBossi commented 1 month ago

I've been using set_chunk_time_interval() directly on caggs (instead of doing a subquery for the materialization hypertable schema and name) without getting errors, and it seems to do the job. Am I wrong?