timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.61k stars 883 forks source link

Rework compression activity wal markers #6920

Closed JamesGuthrie closed 4 months ago

JamesGuthrie commented 4 months ago

This allows to distinguish between "user-driven" and "compression-driven" DML on uncompressed chunks. This is a requirement to be able to support DML on compressed chunks in live migration.

Note: A previous commit 1 added wal markers before and after inserts which were part of "transparent decompression". Transparent decompression is triggered when an UPDATE or DELETE statement affects compressed data, or an INSERT statment inserts into a range of compressed data which has a unique or primary key constraint. In these cases, the data is first moved from the compressed chunk to the uncompressed chunk, and then the DML is applied.

This change extends the existing behaviour on two fronts:

  1. It adds WAL markers for both chunk compression and decompression events.
  2. It extends the WAL markers for transparent decompression to include not only INSERTs into the compressed chunk, but also to TimescaleDB catalog operations which were part of the decompression.
noctarius commented 4 months ago

If I understand the PR correctly, your patch wouldn't send the catalog updates for the chunk tables itself, would it? That would actually break functionality on my tool, since I completely replicate the catalog in memory and need that information.

I'd probably define a second set of markers with slightly different prefix names. In this case everybody would be able to decide to either ignore them or act on them. Alternatively, we could use the message body, like adding the table names, but I think the second prefix is easier and more obvious.

Changing the GUC name is ok, I can let people know in the docs :)

JamesGuthrie commented 4 months ago

If I understand the PR correctly, your patch wouldn't send the catalog updates for the chunk tables itself, would it?

This PR doesn't change which catalog updates are emitted. One thing that has changed is where exactly the "end decompression marker" is in the WAL. It now surrounds both inserts into the uncompressed chunk, and the associated catalog changes. To visualise this:

Previously the WAL would look like this for a "transparent decompression" event (e.g. UPDATE to compressed data):

BEGIN
message: transactional: 1 prefix: ::timescaledb-decompression-start, sz: 0 content:
table _timescaledb_internal.compress_hyper_2_2_chunk: DELETE: (no-tuple-data)
table _timescaledb_internal._hyper_1_1_chunk: INSERT: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:1
table _timescaledb_internal._hyper_1_1_chunk: INSERT: "time"[timestamp with time zone]:'2023-06-30 18:00:00-07' device_id[bigint]:1 value[double precision]:1
message: transactional: 1 prefix: ::timescaledb-decompression-end, sz: 0 content:
table _timescaledb_catalog.chunk: UPDATE: id[integer]:1 hypertable_id[integer]:1 schema_name[name]:'_timescaledb_internal' table_name[name]:'_hyper_1_1_chunk' compressed_chunk_id[integer]:2 dropped[boolean]:false status[integer]:9 osm_chunk[boolean]:false
table _timescaledb_internal._hyper_1_1_chunk: UPDATE: old-key: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:1 new-tuple: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:22
COMMIT

Now, it looks like this:

BEGIN
message: transactional: 1 prefix: ::timescaledb-decompression-start, sz: 0 content:
table _timescaledb_internal.compress_hyper_2_2_chunk: DELETE: (no-tuple-data)
table _timescaledb_internal._hyper_1_1_chunk: INSERT: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:1
table _timescaledb_internal._hyper_1_1_chunk: INSERT: "time"[timestamp with time zone]:'2023-06-30 18:00:00-07' device_id[bigint]:1 value[double precision]:1
table _timescaledb_catalog.chunk: UPDATE: id[integer]:1 hypertable_id[integer]:1 schema_name[name]:'_timescaledb_internal' table_name[name]:'_hyper_1_1_chunk' compressed_chunk_id[integer]:2 dropped[boolean]:false status[integer]:9 osm_chunk[boolean]:false
message: transactional: 1 prefix: ::timescaledb-decompression-end, sz: 0 content:
table _timescaledb_internal._hyper_1_1_chunk: UPDATE: old-key: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:1 new-tuple: "time"[timestamp with time zone]:'2023-06-30 17:00:00-07' device_id[bigint]:1 value[double precision]:22
COMMIT

Note that the timescaledb-decompression-end message now comes after the UPDATE to the _timescaledb_catalog.chunk table.

IMHO this is "more correct", as the decompression markers now surround all decompression activities. Your tool can decide which kinds of activities it does or does not ignore in the context of those markers.

noctarius commented 4 months ago

Ah I see. Yeah that is totally fine, since I can still act on all catalog entries, while ignoring anything outside the catalog 👍