Open JamesGuthrie opened 1 year ago
Put together a PoC: https://github.com/timescale/timescaledb-backfill/pull/58.
There are still some open issues, which are rooted in our assumption of a 1:1 source:target chunk mapping:
Currently we delete all rows from a target chunk before copying rows into it. This is because if we're doing dual-writes with backfill, the source is "authoritative" until the until
point. When multiple source chunks will be written to one target chunk, we must remove the rows from the target chunk on a per-target-chunk basis.
In the PoC implementation we don't track which source chunks will map to the same target chunk, which would be necessary in order to implement this.
It's unclear that it's safe/smart to concurrently copy from multiple source chunks into a target chunk. One argument against is that we drop the invalidation triggers on the chunk before copying data into it. If there are multiple parallel writes into the same target chunk, these may become weirdly interleaved.
As above, because we don't track which source chunks map to the same target, we can't serialize these copies.
Users may wish to remove space dimensions which they added. The only way to do this today is to rewrite a whole hypertable. If the user is already migrating the data, it might be opportune to remove a space dimension during the migration.