timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.95k stars 884 forks source link

[Bug]: Segmentation fault when inserting into hypertable after import of old database #7431

Open luveti opened 2 weeks ago

luveti commented 2 weeks ago

What type of bug is this?

Crash

What subsystems and features are affected?

Other

What happened?

I'm working on upgrading an old server.

The old server is running:

The new server is running:

On the old server we've been taking daily database dumps using:

pg_dump -U "<user>" "<database>" | gzip > $PROD_DUMP_GZ

To restore this on the new server I run:

psql -d postgres -c "select pg_terminate_backend(pid) from pg_stat_activity where datname = 'database_new';"
psql -d postgres -c 'drop database if exists database_new;'
psql -d postgres -c 'create database database_new;'
psql -d database_new -c 'create extension if not exists timescaledb;'
psql -d database_new -c 'select timescaledb_pre_restore();'
gzip -d $PROD_DUMP_GZ | psql -d database_new
psql -d database_new -c 'select timescaledb_post_restore();'
psql -d postgres -c "select pg_terminate_backend(pid) from pg_stat_activity where datname = 'database_old';"
psql -d postgres -c 'drop database if exists database_old;'
psql -d postgres -c "select pg_terminate_backend(pid) from pg_stat_activity where datname = 'database';"
psql -d postgres -c 'alter database database rename to database_old;'
psql -d postgres -c 'alter database database_new rename to database;'

The import runs fine (as far as I can tell) and I'm able to load up the related web application and navigate around just like on the production system. Many things work. But appears inserting into a hypertable does not.

When inserting into a hypertable the following gets logged by postgresql:

Nov 08 15:47:07 hostname postgres[183725]: [183725] WARNING:  function _timescaledb_internal.insert_blocker() is deprecated and has been moved to _timescaledb_functions schema. this compatibility function will be removed in a future version.
Nov 08 15:47:07 hostname postgres[183725]: [183725] CONTEXT:  PL/pgSQL function _timescaledb_internal.insert_blocker() line 4 at RAISE
Nov 08 15:47:28 hostname postgres[902]: [902] LOG:  server process (PID 183725) was terminated by signal 11: Segmentation fault
Nov 08 15:47:28 hostname postgres[902]: [902] DETAIL:  Failed process was running: INSERT INTO "hypertable" ("foo") VALUES ('bar') RETURNING "foo"
Nov 08 15:47:28 hostname postgres[902]: [902] LOG:  terminating any other active server processes
Nov 08 15:47:28 hostname postgres[902]: [902] LOG:  all server processes terminated; reinitializing
Nov 08 15:47:28 hostname postgres[183746]: [183746] LOG:  database system was interrupted; last known up at 2024-11-08 15:44:12 GMT
Nov 08 15:47:28 hostname postgres[183749]: [183749] FATAL:  the database system is in recovery mode
Nov 08 15:47:28 hostname postgres[183750]: [183750] FATAL:  the database system is in recovery mode
Nov 08 15:47:28 hostname postgres[183751]: [183751] FATAL:  the database system is in recovery mode
Nov 08 15:47:29 hostname postgres[183746]: [183746] LOG:  database system was not properly shut down; automatic recovery in progress
Nov 08 15:47:29 hostname postgres[183746]: [183746] LOG:  redo starts at 3/71F9F240
Nov 08 15:47:29 hostname postgres[183746]: [183746] LOG:  invalid record length at 3/71F9F7A0: expected at least 24, got 0
Nov 08 15:47:29 hostname postgres[183746]: [183746] LOG:  redo done at 3/71F9F768 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
Nov 08 15:47:29 hostname postgres[183747]: [183747] LOG:  checkpoint starting: end-of-recovery immediate wait
Nov 08 15:47:29 hostname postgres[183747]: [183747] LOG:  checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.010 s, sync=0.004 s, total=0.022 s; sync files=4, longest=0.002 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=3/71F9F7A0, redo lsn=3/71F9F7A0
Nov 08 15:47:29 hostname postgres[902]: [902] LOG:  database system is ready to accept connections
Nov 08 15:47:29 hostname postgres[183754]: [183754] LOG:  TimescaleDB background worker launcher connected to shared catalogs

This commenter appears to have a similar issue.

TimescaleDB version affected

2.14.2

PostgreSQL version used

16.4

What operating system did you use?

NixOS 24.11 x64

What installation method did you use?

Other

What platform did you run on?

Other

Relevant log output and stack trace

No response

How can we reproduce the bug?

Details are provided in the What happened? section.

akuzm commented 2 weeks ago

Does it reproduce on 2.17.2? Would be good to see a stacktrace.

luveti commented 2 weeks ago

Here's a stack trace from systemd-coredump:

Module timescaledb-tsl-2.14.2.so without build-id.
Module timescaledb-2.14.2.so without build-id.
Module timescaledb.so without build-id.
Module libcap.so.2 without build-id.
Module libgcc_s.so.1 without build-id.
Module libstdc++.so.6 without build-id.
Module libicudata.so.74 without build-id.
Module libkeyutils.so.1 without build-id.
Module libkrb5support.so.0 without build-id.
Module libcom_err.so.3 without build-id.
Module libk5crypto.so.3 without build-id.
Module libkrb5.so.3 without build-id.
Module libaudit.so.1 without build-id.
Module libicuuc.so.74 without build-id.
Module libicui18n.so.74 without build-id.
Module libz.so.1 without build-id.
Module libgssapi_krb5.so.2 without build-id.
Module libpam.so.0 without build-id.
Module libxml2.so.2 without build-id.
Module liblz4.so.1 without build-id.
Module libzstd.so.1 without build-id.
Stack trace of thread 183725:
#0  0x00007f8bf379e6ce ts_hypertable_insert_blocker (timescaledb-2.14.2.so + 0x4b6ce)
#1  0x00005580c9adf1e7 ExecInterpExpr (.postgres-wrapped + 0x3031e7)
#2  0x00007f8bf3aded93 exec_eval_expr (plpgsql.so + 0x10d93)
#3  0x00007f8bf3ae4033 exec_stmts (plpgsql.so + 0x16033)
#4  0x00007f8bf3ae4dbe exec_stmt_block (plpgsql.so + 0x16dbe)
#5  0x00007f8bf3ae4e9d exec_toplevel_block.constprop.0 (plpgsql.so + 0x16e9d)
#6  0x00007f8bf3ae5215 plpgsql_exec_trigger (plpgsql.so + 0x17215)
#7  0x00007f8bf3aefcaa plpgsql_call_handler (plpgsql.so + 0x21caa)
#8  0x00005580c9dc53e5 fmgr_security_definer (.postgres-wrapped + 0x5e93e5)
#9  0x00005580c9abd993 ExecCallTriggerFunc (.postgres-wrapped + 0x2e1993)
#10 0x00005580c9ac2705 ExecBRInsertTriggers (.postgres-wrapped + 0x2e6705)
#11 0x00005580c9b0e055 ExecInsert (.postgres-wrapped + 0x332055)
#12 0x00005580c9b0ef30 ExecModifyTable (.postgres-wrapped + 0x332f30)
#13 0x00005580c9ae3002 standard_ExecutorRun (.postgres-wrapped + 0x307002)
#14 0x00005580c9c86450 ProcessQuery (.postgres-wrapped + 0x4aa450)
#15 0x00005580c9c86ec2 PortalRunMulti (.postgres-wrapped + 0x4aaec2)
#16 0x00005580c9c8704f FillPortalStore (.postgres-wrapped + 0x4ab04f)
#17 0x00005580c9c8740d PortalRun (.postgres-wrapped + 0x4ab40d)
#18 0x00005580c9c83802 exec_simple_query (.postgres-wrapped + 0x4a7802)
#19 0x00005580c9c85b04 PostgresMain (.postgres-wrapped + 0x4a9b04)
#20 0x00005580c9bfb55b ServerLoop.isra.0 (.postgres-wrapped + 0x41f55b)
#21 0x00005580c9bfc561 PostmasterMain (.postgres-wrapped + 0x420561)
#22 0x00005580c9942bbc main (.postgres-wrapped + 0x166bbc)
#23 0x00007f8bf5a3314e __libc_start_call_main (libc.so.6 + 0x2a14e)
#24 0x00007f8bf5a33209 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a209)
#25 0x00005580c9942f05 _start (.postgres-wrapped + 0x166f05)
ELF object binary architecture: AMD x86-64

2.14.2 is the latest packaged for Nix. But I'll look into creating an overlay that uses 2.17.2.

luveti commented 2 weeks ago

Could this be related? https://github.com/timescale/timescaledb/commit/8347621016643edf4fcba602f8be54ef94e9c5fd. The relevant issue is https://github.com/timescale/timescaledb/issues/6819

fabriziomello commented 2 weeks ago

Looks like it was fixed on 2.15.0 (https://github.com/timescale/timescaledb/releases/tag/2.15.0). Can u please try to update it.

luveti commented 2 weeks ago

I'm now running: PostgreSQL: 16.4 TimescaleDB: 2.16.0

PostgreSQL no longer segmentation faults, but I get the following error when inserting into a hypertable:

Nov 13 19:49:21 hostname postgres[16707]: [16707] STATEMENT:  INSERT INTO "hypertable" ("foo") VALUES ('bar') RETURNING "foo"
Nov 13 19:49:22 hostname postgres[16707]: [16707] WARNING:  function _timescaledb_internal.insert_blocker() is deprecated and has been moved to _timescaledb_functions schema. this compatibility function will be removed in a future version.
Nov 13 19:49:22 hostname postgres[16707]: [16707] CONTEXT:  PL/pgSQL function _timescaledb_internal.insert_blocker() line 4 at RAISE
Nov 13 19:49:22 hostname postgres[16707]: [16707] ERROR:  insert_blocker: not called by trigger manager
Nov 13 19:49:22 hostname postgres[16707]: [16707] CONTEXT:  PL/pgSQL function _timescaledb_internal.insert_blocker() line 6 at RETURN
luveti commented 2 weeks ago

It looks like the error I'm running into is similar to https://github.com/timescale/timescaledb/issues/7383.

SELECT * FROM timescaledb_information.hypertables; shows 0 rows.

Is pg_dump -U "<user>" "<database>" | gzip > $PROD_DUMP_GZ not a valid way to backup an entire database?

https://docs.timescale.com/self-hosted/latest/migration/entire-database/#migrating-the-entire-database-at-once creates a .bak instead of a .sql.

fabriziomello commented 2 weeks ago

It looks like the error I'm running into is similar to #7383.

SELECT * FROM timescaledb_information.hypertables; shows 0 rows.

Is pg_dump -U "<user>" "<database>" | gzip > $PROD_DUMP_GZ not a valid way to backup an entire database?

https://docs.timescale.com/self-hosted/latest/migration/entire-database/#migrating-the-entire-database-at-once creates a .bak instead of a .sql.

It is a valid way, but when restoring either using the pg_restore or running the .sql script produced by dump you should execute SELECT timescaledb_pre_restore(); before and SELECT timescaledb_post_restore(); after.

One question, are u sure you're dumping and restoring your database using the same TimescaleDB version on source and target?

luveti commented 1 week ago

My original comment includes these details!

I'm not importing into a database with the same versions, but instead newer versions. I have a legacy server that produces daily database backups that I would like to import into newer PostgreSQL and TimescaleDB versions running on a new server.

fabriziomello commented 1 week ago

My original comment includes these details!

Ohh sorry I've missed that part (so many things happening at same time).

I'm not importing into a database with the same versions, but instead newer versions. I have a legacy server that produces daily database backups that I would like to import into newer PostgreSQL and TimescaleDB versions running on a new server.

Have a look to this documentation about upgrading TimescaleDB that we mention that you "You cannot upgrade TimescaleDB and PostgreSQL at the same time".

This shoudl be done in this way (and sometimes you need to execute multi-step upgrade on Postgres versions due to compatibility confligs) because we can include tsdb metadata changes between versions and also other kind of fixes that is executed during the extension update (aka ALTER EXTENSION timescaledb UPDATE).