Optimize UPDATE/DELETE on distributed hypertables

erimatnor commented 2 years ago

Estimation

O:1W P:3W

Currently, UPDATE and DELETE happens per-chunk and requires fetching each row to be deleted. This should be optimized to only translate to one DELETE/UPDATE per remote datanode on their respective hypertables.

Ideally, we should only do DIRECT deletes, i.e.,, using the direct API in the foreign data wrapper that avoids first fetching tuples. This only works if we don't have triggers on the frontend, which will probably be the case (but we'd have to decide to block triggers permanently) and local joins (with, e.g., tables on the AN).

Another point of optimizations is dealing with replication. Currently, we update/delete by first fetching tuples via a SELECT then deleting/updating via the ctid. This is done per chunk replica. One could optimize this to only select from one replica, then update/delete on all the replicas. This could, however, be a problem if ctid does not match across replica chunks.

chennailong commented 2 years ago

that's true. when i drop_chunks on hypertables, it'll get a lock(AccessExclusiveLock). how can i do that,? i want to drop_chunks. (SELECT drop_chunks('subs060215280005_m', older_than => DATE '2021-09-19'); )

nikkhils commented 1 year ago

Due to vacuum and other processes running independently on DNs we cannot practically assume about ctids being the same across the DNs.

I guess identifying target DNs and then directly shipping the DELETE/UPDATE commands to them instead of fetching the ctids first from each DN might be better.

svenklemm commented 4 months ago

multinode has been removed

timescale / timescaledb

Optimize UPDATE/DELETE on distributed hypertables #3795

Estimation