timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.83k stars 852 forks source link

[Enhancement]: Delete rows from CAggs using DELETE FROM #6965

Open TomoBossi opened 1 month ago

TomoBossi commented 1 month ago

What type of enhancement is this?

Other, User experience

What subsystems and features will be improved?

Continuous aggregate

What does the enhancement do?

Currently, DELETE FROM ... WHERE ... statements are supported for raw Hypertables but not Caggs.

When attempting to delete from a Cagg an error is returned:

ERROR:  cannot delete from view "<cagg_name>"
DETAIL:  Views containing UNION, INTERSECT, or EXCEPT are not automatically updatable.
HINT:  To enable deleting from the view, provide an INSTEAD OF DELETE trigger or an unconditional ON DELETE DO INSTEAD rule.

Which to my understanding is due to the Cagg being constructed from the UNION of its underlying chunks. It is however possible to delete rows using DELETE FROM on the individual chunks themselves. I'm currently doing this to get rid of old and entirely useless data that would otherwise remain in my Caggs.

I'd like to request that deleting specific rows from a Cagg is added as a feature, like it has been for raw Hypertables, and would also like to know whether there are any dangerous side effects to deleting from the underlying chunks directly, as I suspect there are (and are the reason why this is not supported).

Thank you!

Implementation challenges

No response

RobAtticus commented 1 month ago

If the deletion is mostly for older data, I believe you can add retention policies to a cagg: https://docs.timescale.com/use-timescale/latest/data-retention/create-a-retention-policy/

A CAgg is more analogous to Postgres' materialized views, which also don't allow deleting/updating/etc directly on them, because it won't know how to manage that data when it is next refreshed. So for removing older data that is no longer relevant, I think the retention policy is your best bet. Although that assumes you are trying to delete all data older than a certain point, rather than a subset of it.