timescale / timescaledb-backfill

Backfill hypertable data from one timescale instance to another
Apache License 2.0
0 stars 0 forks source link

Investigate collapsing space dimensions #59

Open JamesGuthrie opened 1 year ago

JamesGuthrie commented 1 year ago

Users may wish to remove space dimensions which they added. The only way to do this today is to rewrite a whole hypertable. If the user is already migrating the data, it might be opportune to remove a space dimension during the migration.

JamesGuthrie commented 1 year ago

Put together a PoC: https://github.com/timescale/timescaledb-backfill/pull/58.

There are still some open issues, which are rooted in our assumption of a 1:1 source:target chunk mapping:

Removing rows in a target chunk before copying into it

Currently we delete all rows from a target chunk before copying rows into it. This is because if we're doing dual-writes with backfill, the source is "authoritative" until the until point. When multiple source chunks will be written to one target chunk, we must remove the rows from the target chunk on a per-target-chunk basis.

In the PoC implementation we don't track which source chunks will map to the same target chunk, which would be necessary in order to implement this.

Serializing copies into target chunks

It's unclear that it's safe/smart to concurrently copy from multiple source chunks into a target chunk. One argument against is that we drop the invalidation triggers on the chunk before copying data into it. If there are multiple parallel writes into the same target chunk, these may become weirdly interleaved.

As above, because we don't track which source chunks map to the same target, we can't serialize these copies.