timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.83k stars 852 forks source link

Live migration: Failed to find hypertable from map #6972

Closed john9x closed 1 month ago

john9x commented 1 month ago

What type of bug is this?

Other

What subsystems and features are affected?

Restore

What happened?

We want to upgrade our PG12 + TimescaleDB to newest PG major version (at least 15) with low downtime using live migration tool. Snapshotting works fine

10:16:08.047 13406 INFO   Running pgcopydb version 0.14 from "/usr/local/bin/pgcopydb"
10:16:08.135 13406 INFO   Created logical replication slot "pgcopydb" with plugin "wal2json" at B5/87224238 and exported snapshot 00000018-00000038-1
2024-05-28T10:16:08.136 INFO: Snapshot 00000018-00000038-1 created successfully.
You can now start the migration process by running the following command:

            docker run -it --rm --name live-migration-migrate \
            -e PGCOPYDB_SOURCE_PGURI=$SOURCE \
            -e PGCOPYDB_TARGET_PGURI=$TARGET \
            --pid=host \
            -v ~/live-migration:/opt/timescale/ts_cdc \
            timescale/live-migration:v0.0.16 \
            migrate

Next I try to migrate and snapshot restoring well but then I receive tons of errors for buffered messages

2024-05-28 10:23:39.821 17284 ERROR  timescale.c:161           Failed to find hypertable from map for _timescaledb_internal._ifdb_1_hyper_16796_chunk
2024-05-28 10:23:39.821 17284 ERROR  ld_wal2json.c:337         Failed to map chunk _timescaledb_internal._ifdb_1_hyper_16796_chunk to hypertable
2024-05-28 10:23:39.821 17284 ERROR  ld_wal2json.c:161         Failed to parse truncated message missing schema or table property

given hypertables/chunks are exist at target server.

Tested on target servers installed from docker with images timescale/timescaledb:2.11.2-pg14 and timescale/timescaledb:2.11.2-pg15

TimescaleDB version affected

2.11.2

PostgreSQL version used

12

What operating system did you use?

Ubuntu 20.04

What installation method did you use?

Deb/Apt

What platform did you run on?

Amazon Web Services (AWS)

Relevant log output and stack trace

No response

How can we reproduce the bug?

see above
alejandrodnm commented 1 month ago

The issue is that the live-migration tools relies on the chunk names to have the format _hyper_<hypertable_id>_<chunk_id>_chunk, while yours are _timescaledb_internal._ifdb_1_hyper_16796_chunk.

When a chunk is replicated, the hypertable id is taken from the chunk name and then added to a map that identifies which chunks belong to which tables. This is because DDL statements are not replicated, so if a new chunk is created in source when the data is replicated to the target there will be an error inserting the data because the chunk doesn't exists.

Instead of trying to insert directly into the chunk, the tool finds the parent table from the hypertable id and applies the statement directly to the parent table. This way we rely on the hypertable to handle DML to the correct chunk, and creating the chunk if necessary.

I'll bring this up with the team and see if there's something that can be done.

Do you know why your chunk names have that format? Did you set associated_table_prefix when creating the hypertable?

john9x commented 1 month ago

Do you know why your chunk names have that format?

Hi! Sure I know it! I have set it at the hypertable creation.

select * from create_hypertable(... associated_table_prefix => '_ifdb_1_hyper');

I have more than one hypertables and I supposed thas is good idea to give readable name to chunks.

What can I do with this now? Some workaround?

alejandrodnm commented 1 month ago

Some workaround

Not currently, sorry.

The tool is still experimental and we're still finding out some of the edge cases. Sadly this is a scenario that wasn't taken into account.

I'm going to open an internal ticket with this, since this is the repository of the database extension and it's a separate team from the one developing the live-migrations. I'm going to close this ticket, but I'll reply here nonetheless if I have an update for you.

john9x commented 1 month ago

Oh, sadly. I hope it will be fixed soon. Thanks!

danielabelski commented 1 month ago

@alejandrodnm Hi! maybe there is a time estimate? Maybe there is a bounty for this?