timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.88k stars 882 forks source link

[Bug]: Memory leak with distributed tables #4129

Closed tatzlwurm2 closed 9 months ago

tatzlwurm2 commented 2 years ago

What type of bug is this?

Crash

What subsystems and features are affected?

Access node

What happened?

We have an application talking to Postgresql 13.5 (On Red Hat Enterprise Linux Server release 7.9 (Maipo)) with the following extensions

                                         List of installed extensions
        Name        | Version |   Schema   |                            Description
--------------------+---------+------------+-------------------------------------------------------------------
btree_gist         | 1.5     | public     | support for indexing common datatypes in GiST
pg_prewarm         | 1.2     | public     | prewarm relation data
pg_repack          | 1.4.7   | public     | Reorganize tables in PostgreSQL databases with minimal locks
pg_stat_kcache     | 2.2.0   | public     | Kernel statistics gathering
pg_stat_statements | 1.7     | public     | track execution statistics of all SQL statements executed
pg_trgm            | 1.4     | public     | text similarity measurement and index searching based on trigrams
pgaudit            | 1.4.2   | public     | provides auditing functionality
pgstattuple        | 1.5     | public     | show tuple-level statistics
plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
timescaledb        | 2.5.2   | public     | Enables scalable inserts and complex queries for time-series data
unaccent           | 1.1     | public     | text search dictionary that removes accents
(11 rows)

We have this configuration deployed to dev/nit/nat – production r1,r2,r3 In R1,2,3 we are now experiencing the following:

R1

dmesg -T|grep Killed
[Thu Feb 24 04:33:49 2022] Killed process 67257 (postgres), UID 57887, total-vm:21705156kB, anon-rss:14444612kB, file-rss:0kB, shmem-rss:27280kB
[Fri Feb 25 03:25:14 2022] Killed process 28293 (postgres), UID 57887, total-vm:21941872kB, anon-rss:14587816kB, file-rss:60kB, shmem-rss:13244kB

R2

[Thu Feb 10 05:07:07 2022] Killed process 19759 (postgres), UID 57887, total-vm:19661448kB, anon-rss:14380492kB, file-rss:0kB, shmem-rss:17456kB
[Fri Feb 11 04:16:54 2022] Killed process 92706 (postgres), UID 57887, total-vm:19726820kB, anon-rss:14376796kB, file-rss:0kB, shmem-rss:14560kB
[Sat Feb 12 04:16:39 2022] Killed process 62885 (postgres), UID 57887, total-vm:19850512kB, anon-rss:14514244kB, file-rss:0kB, shmem-rss:14740kB
[Sun Feb 13 04:17:01 2022] Killed process 21477 (postgres), UID 57887, total-vm:19838228kB, anon-rss:14529932kB, file-rss:0kB, shmem-rss:14380kB
[Mon Feb 14 04:26:28 2022] Killed process 130575 (postgres), UID 57887, total-vm:20147584kB, anon-rss:14831064kB, file-rss:0kB, shmem-rss:15000kB
[Tue Feb 15 04:16:27 2022] Killed process 95862 (postgres), UID 57887, total-vm:19567520kB, anon-rss:14220152kB, file-rss:0kB, shmem-rss:18860kB
[Wed Feb 16 04:26:08 2022] Killed process 58079 (postgres), UID 57887, total-vm:20150216kB, anon-rss:14829152kB, file-rss:0kB, shmem-rss:25160kB
[Thu Feb 17 04:20:20 2022] Killed process 20071 (postgres), UID 57887, total-vm:19717384kB, anon-rss:14373256kB, file-rss:0kB, shmem-rss:24752kB
[Fri Feb 18 04:16:22 2022] Killed process 106579 (postgres), UID 57887, total-vm:19388164kB, anon-rss:14055676kB, file-rss:0kB, shmem-rss:18712kB
[Fri Feb 18 19:16:12 2022] Killed process 69438 (postgres), UID 57887, total-vm:19529264kB, anon-rss:14243780kB, file-rss:24kB, shmem-rss:14332kB
[Sat Feb 19 04:16:06 2022] Killed process 75229 (postgres), UID 57887, total-vm:19361956kB, anon-rss:13983328kB, file-rss:0kB, shmem-rss:21944kB
[Sun Feb 20 04:16:07 2022] Killed process 29523 (postgres), UID 57887, total-vm:19531512kB, anon-rss:14259448kB, file-rss:0kB, shmem-rss:20396kB
[Mon Feb 21 04:25:46 2022] Killed process 837 (postgres), UID 57887, total-vm:20051852kB, anon-rss:14740816kB, file-rss:0kB, shmem-rss:25692kB
[Tue Feb 22 04:15:49 2022] Killed process 86831 (postgres), UID 57887, total-vm:19338472kB, anon-rss:14277928kB, file-rss:0kB, shmem-rss:21348kB
[Wed Feb 23 04:15:41 2022] Killed process 57688 (postgres), UID 57887, total-vm:19421312kB, anon-rss:14094148kB, file-rss:24kB, shmem-rss:19220kB
[Thu Feb 24 04:19:04 2022] Killed process 19800 (postgres), UID 57887, total-vm:19506544kB, anon-rss:14420172kB, file-rss:0kB, shmem-rss:5172kB
[Fri Feb 25 04:12:16 2022] Killed process 110640 (postgres), UID 57887, total-vm:19338936kB, anon-rss:14268400kB, file-rss:0kB, shmem-rss:10268kB

R3

dmesg -T|grep Killed
[Sat Feb 12 03:50:21 2022] Killed process 124865 (postgres), UID 57887, total-vm:22568688kB, anon-rss:14946796kB, file-rss:0kB, shmem-rss:5904kB
[Sun Feb 13 03:11:14 2022] Killed process 85657 (postgres), UID 57887, total-vm:22192904kB, anon-rss:14533476kB, file-rss:0kB, shmem-rss:9248kB
[Tue Feb 15 03:06:26 2022] Killed process 54890 (postgres), UID 57887, total-vm:21901504kB, anon-rss:14297928kB, file-rss:0kB, shmem-rss:13664kB
[Wed Feb 16 03:11:17 2022] Killed process 25890 (postgres), UID 57887, total-vm:22273496kB, anon-rss:14609512kB, file-rss:0kB, shmem-rss:13960kB
[Thu Feb 17 03:16:16 2022] Killed process 125560 (postgres), UID 57887, total-vm:22632260kB, anon-rss:14942308kB, file-rss:0kB, shmem-rss:12232kB
[Fri Feb 18 03:15:35 2022] Killed process 83781 (postgres), UID 57887, total-vm:22608092kB, anon-rss:14949052kB, file-rss:0kB, shmem-rss:14292kB
[Sat Feb 19 03:14:40 2022] Killed process 68004 (postgres), UID 57887, total-vm:22660264kB, anon-rss:14909464kB, file-rss:0kB, shmem-rss:9060kB
[Sun Feb 20 03:10:12 2022] Killed process 38396 (postgres), UID 57887, total-vm:22216668kB, anon-rss:14584096kB, file-rss:0kB, shmem-rss:14936kB
[Mon Feb 21 03:10:10 2022] Killed process 5070 (postgres), UID 57887, total-vm:22185132kB, anon-rss:14504492kB, file-rss:0kB, shmem-rss:13632kB
[Tue Feb 22 03:10:58 2022] Killed process 108623 (postgres), UID 57887, total-vm:22105352kB, anon-rss:14446612kB, file-rss:0kB, shmem-rss:23396kB
[Wed Feb 23 03:06:19 2022] Killed process 79361 (postgres), UID 57887, total-vm:21658604kB, anon-rss:13951348kB, file-rss:0kB, shmem-rss:24488kB
[Thu Feb 24 03:09:06 2022] Killed process 51058 (postgres), UID 57887, total-vm:21714656kB, anon-rss:14476408kB, file-rss:0kB, shmem-rss:13472kB
[Fri Feb 25 03:06:52 2022] Killed process 19213 (postgres), UID 57887, total-vm:21640344kB, anon-rss:14490564kB, file-rss:56kB, shmem-rss:14324kB

Last line in each of the output above is after I upgraded the extension from 2.5.0 to 2.5.2.

TimescaleDB version affected

2.5.2

PostgreSQL version used

13.5

What operating system did you use?

Red Hat Enterprise Linux Server release 7.9 (Maipo)

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

Example of one of our config files for postgresql.conf

# -----------------------------
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# (The "=" is optional.)  Whitespace may be used.  Comments are introduced with
# "#" anywhere on a line.  The complete list of parameter names and allowed
# values can be found in the PostgreSQL documentation.
#
# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, run "pg_ctl reload", or execute
# "SELECT pg_reload_conf()".  Some parameters, which are marked below,
# require a server shutdown and restart to take effect.
#
# Any parameter can also be given as a command-line option to the server, e.g.,
# "postgres -c log_connections=on".  Some parameters can be changed at run time
# with the "SET" SQL command.
#
# Memory units:  B  = bytes            Time units:  us  = microseconds
#                kB = kilobytes                     ms  = milliseconds
#                MB = megabytes                     s   = seconds
#                GB = gigabytes                     min = minutes
#                TB = terabytes                     h   = hours
#                                                   d   = days

#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'       # use data in another directory
                    # (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
                    # (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
                    # (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = ''         # write an extra PID file
                    # (change requires restart)

#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------

# - Connection Settings -

#listen_addresses = 'localhost'     # what IP address(es) to listen on;
                    # comma-separated list of addresses;
                    # defaults to 'localhost'; use '*' for all
                    # (change requires restart)
#port = 5601                # (change requires restart)
max_connections = 100           # (change requires restart)
#superuser_reserved_connections = 3 # (change requires restart)
#unix_socket_directories = '/tmp'   # comma-separated list of directories
                    # (change requires restart)
#unix_socket_group = ''         # (change requires restart)
#unix_socket_permissions = 0777     # begin with 0 to use octal notation
                    # (change requires restart)
#bonjour = off              # advertise server via Bonjour
                    # (change requires restart)
#bonjour_name = ''          # defaults to the computer name
                    # (change requires restart)

# - TCP settings -
# see "man 7 tcp" for details

#tcp_keepalives_idle = 0        # TCP_KEEPIDLE, in seconds;
                    # 0 selects the system default
#tcp_keepalives_interval = 0        # TCP_KEEPINTVL, in seconds;
                    # 0 selects the system default
#tcp_keepalives_count = 0       # TCP_KEEPCNT;
                    # 0 selects the system default
#tcp_user_timeout = 0           # TCP_USER_TIMEOUT, in milliseconds;
                    # 0 selects the system default

# - Authentication -

#authentication_timeout = 1min      # 1s-600s
#password_encryption = md5      # md5 or scram-sha-256
#db_user_namespace = off

# GSSAPI using Kerberos
#krb_server_keyfile = 'FILE:${sysconfdir}/krb5.keytab'
#krb_caseins_users = off

# - SSL -

#ssl = off
#ssl_ca_file = ''
#ssl_cert_file = 'server.crt'
#ssl_crl_file = ''
#ssl_key_file = 'server.key'
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers
#ssl_prefer_server_ciphers = on
#ssl_ecdh_curve = 'prime256v1'
#ssl_min_protocol_version = 'TLSv1'
#ssl_max_protocol_version = ''
#ssl_dh_params_file = ''
#ssl_passphrase_command = ''
#ssl_passphrase_command_supports_reload = off

#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------

# - Memory -

shared_buffers = 128MB          # min 128kB
                    # (change requires restart)
#huge_pages = try           # on, off, or try
                    # (change requires restart)
#temp_buffers = 8MB         # min 800kB
#max_prepared_transactions = 0      # zero disables the feature
                    # (change requires restart)
# Caution: it is not advisable to set max_prepared_transactions nonzero unless
# you actively intend to use prepared transactions.
#work_mem = 4MB             # min 64kB
#maintenance_work_mem = 64MB        # min 1MB
#autovacuum_work_mem = -1       # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB          # min 100kB
#shared_memory_type = mmap      # the default is the first option
                    # supported by the operating system:
                    #   mmap
                    #   sysv
                    #   windows
                    # (change requires restart)
dynamic_shared_memory_type = posix  # the default is the first option
                    # supported by the operating system:
                    #   posix
                    #   sysv
                    #   windows
                    #   mmap
                    # (change requires restart)

# - Disk -

#temp_file_limit = -1           # limits per-process temp file space
                    # in kB, or -1 for no limit

# - Kernel Resources -

#max_files_per_process = 1000       # min 25
                    # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0          # 0-100 milliseconds (0 disables)
#vacuum_cost_page_hit = 1       # 0-10000 credits
#vacuum_cost_page_miss = 10     # 0-10000 credits
#vacuum_cost_page_dirty = 20        # 0-10000 credits
#vacuum_cost_limit = 200        # 1-10000 credits

# - Background Writer -

#bgwriter_delay = 200ms         # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100        # max buffers written/round, 0 disables
#bgwriter_lru_multiplier = 2.0      # 0-10.0 multiplier on buffers scanned/round
#bgwriter_flush_after = 512kB       # measured in pages, 0 disables

# - Asynchronous Behavior -

#effective_io_concurrency = 1       # 1-1000; 0 disables prefetching
#max_worker_processes = 8       # (change requires restart)
#max_parallel_maintenance_workers = 2   # taken from max_parallel_workers
#max_parallel_workers_per_gather = 2    # taken from max_parallel_workers
#parallel_leader_participation = on
#max_parallel_workers = 8       # maximum number of max_worker_processes that
                    # can be used in parallel operations
#old_snapshot_threshold = -1        # 1min-60d; -1 disables; 0 is immediate
                    # (change requires restart)
#backend_flush_after = 0        # measured in pages, 0 disables

#------------------------------------------------------------------------------
# WRITE-AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

#wal_level = replica            # minimal, replica, or logical
                    # (change requires restart)
#fsync = on             # flush data to disk for crash safety
                    # (turning this off can cause
                    # unrecoverable data corruption)
#synchronous_commit = on        # synchronization level;
                    # off, local, remote_write, remote_apply, or on
#wal_sync_method = fsync        # the default is the first option
                    # supported by the operating system:
                    #   open_datasync
                    #   fdatasync (default on Linux and FreeBSD)
                    #   fsync
                    #   fsync_writethrough
                    #   open_sync
#full_page_writes = on          # recover from partial page writes
#wal_compression = off          # enable compression of full-page writes
#wal_log_hints = off            # also do full page writes of non-critical updates
                    # (change requires restart)
#wal_init_zero = on         # zero-fill new WAL files
#wal_recycle = on           # recycle WAL files
#wal_buffers = -1           # min 32kB, -1 sets based on shared_buffers
                    # (change requires restart)
#wal_writer_delay = 200ms       # 1-10000 milliseconds
#wal_writer_flush_after = 1MB       # measured in pages, 0 disables

#commit_delay = 0           # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000

# - Checkpoints -

#checkpoint_timeout = 5min      # range 30s-1d
max_wal_size = 1GB
min_wal_size = 80MB
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_flush_after = 256kB     # measured in pages, 0 disables
#checkpoint_warning = 30s       # 0 disables

# - Archiving -

#archive_mode = off     # enables archiving; off, on, or always
                # (change requires restart)
#archive_command = ''       # command to use to archive a logfile segment
                # placeholders: %p = path of file to archive
                #               %f = file name only
                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0        # force a logfile segment switch after this
                # number of seconds; 0 disables

# - Archive Recovery -

# These are only used in recovery mode.

#restore_command = ''       # command to use to restore an archived logfile segment
                # placeholders: %p = path of file to restore
                #               %f = file name only
                # e.g. 'cp /mnt/server/archivedir/%f %p'
                # (change requires restart)
#archive_cleanup_command = ''   # command to execute at every restartpoint
#recovery_end_command = ''  # command to execute at completion of recovery

# - Recovery Target -

# Set these only when performing a targeted recovery.

#recovery_target = ''       # 'immediate' to end recovery as soon as a
                                # consistent state is reached
                # (change requires restart)
#recovery_target_name = ''  # the named restore point to which recovery will proceed
                # (change requires restart)
#recovery_target_time = ''  # the time stamp up to which recovery will proceed
                # (change requires restart)
#recovery_target_xid = ''   # the transaction ID up to which recovery will proceed
                # (change requires restart)
#recovery_target_lsn = ''   # the WAL LSN up to which recovery will proceed
                # (change requires restart)
#recovery_target_inclusive = on # Specifies whether to stop:
                # just after the specified recovery target (on)
                # just before the recovery target (off)
                # (change requires restart)
#recovery_target_timeline = 'latest'    # 'current', 'latest', or timeline ID
                # (change requires restart)
#recovery_target_action = 'pause'   # 'pause', 'promote', 'shutdown'
                # (change requires restart)

#------------------------------------------------------------------------------
# REPLICATION
#------------------------------------------------------------------------------

# - Sending Servers -

# Set these on the master and on any standby that will send replication data.

#max_wal_senders = 10       # max number of walsender processes
                # (change requires restart)
#wal_keep_segments = 0      # in logfile segments; 0 disables
#wal_sender_timeout = 60s   # in milliseconds; 0 disables

#max_replication_slots = 10 # max number of replication slots
                # (change requires restart)
#track_commit_timestamp = off   # collect timestamp of transaction commit
                # (change requires restart)

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names = '' # standby servers that provide sync rep
                # method to choose sync standbys, number of sync standbys,
                # and comma-separated list of application_name
                # from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is delayed

# - Standby Servers -

# These settings are ignored on a master server.

#primary_conninfo = ''          # connection string to sending server
                    # (change requires restart)
#primary_slot_name = ''         # replication slot on sending server
                    # (change requires restart)
#promote_trigger_file = ''      # file name whose presence ends recovery
#hot_standby = on           # "off" disallows queries during recovery
                    # (change requires restart)
#max_standby_archive_delay = 30s    # max delay before canceling queries
                    # when reading WAL from archive;
                    # -1 allows indefinite delay
#max_standby_streaming_delay = 30s  # max delay before canceling queries
                    # when reading streaming WAL;
                    # -1 allows indefinite delay
#wal_receiver_status_interval = 10s # send replies at least this often
                    # 0 disables
#hot_standby_feedback = off     # send info from standby to prevent
                    # query conflicts
#wal_receiver_timeout = 60s     # time that receiver waits for
                    # communication from master
                    # in milliseconds; 0 disables
#wal_retrieve_retry_interval = 5s   # time to wait before retrying to
                    # retrieve WAL after a failed attempt
#recovery_min_apply_delay = 0       # minimum delay for applying changes during recovery

# - Subscribers -

# These settings are ignored on a publisher.

#max_logical_replication_workers = 4    # taken from max_worker_processes
                    # (change requires restart)
#max_sync_workers_per_subscription = 2  # taken from max_logical_replication_workers

#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_parallel_append = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
#enable_partitionwise_join = off
#enable_partitionwise_aggregate = off
#enable_parallel_hash = on
#enable_partition_pruning = on

# - Planner Cost Constants -

#seq_page_cost = 1.0            # measured on an arbitrary scale
#random_page_cost = 4.0         # same scale as above
#cpu_tuple_cost = 0.01          # same scale as above
#cpu_index_tuple_cost = 0.005       # same scale as above
#cpu_operator_cost = 0.0025     # same scale as above
#parallel_tuple_cost = 0.1      # same scale as above
#parallel_setup_cost = 1000.0   # same scale as above

#jit_above_cost = 100000        # perform JIT compilation if available
                    # and query more expensive than this;
                    # -1 disables
#jit_inline_above_cost = 500000     # inline small functions if query is
                    # more expensive than this; -1 disables
#jit_optimize_above_cost = 500000   # use expensive JIT optimizations if
                    # query is more expensive than this;
                    # -1 disables

#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5            # range 1-10
#geqo_pool_size = 0         # selects default based on effort
#geqo_generations = 0           # selects default based on effort
#geqo_selection_bias = 2.0      # range 1.5-2.0
#geqo_seed = 0.0            # range 0.0-1.0

# - Other Planner Options -

#default_statistics_target = 100    # range 1-10000
#constraint_exclusion = partition   # on, off, or partition
#cursor_tuple_fraction = 0.1        # range 0.0-1.0
#from_collapse_limit = 8
#join_collapse_limit = 8        # 1 disables collapsing of explicit
                    # JOIN clauses
#force_parallel_mode = off
#jit = on               # allow JIT compilation
#plan_cache_mode = auto         # auto, force_generic_plan or
                    # force_custom_plan

#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

# - Where to Log -

#log_destination = 'stderr'     # Valid values are combinations of
                    # stderr, csvlog, syslog, and eventlog,
                    # depending on platform.  csvlog
                    # requires logging_collector to be on.

# This is used when logging to stderr:
#logging_collector = off        # Enable capturing of stderr and csvlog
                    # into log files. Required to be on for
                    # csvlogs.
                    # (change requires restart)

# These are only used if logging_collector is on:
#log_directory = 'log'          # directory where log files are written,
                    # can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'    # log file name pattern,
                    # can include strftime() escapes
#log_file_mode = 0600           # creation mode for log files,
                    # begin with 0 to use octal notation
#log_truncate_on_rotation = off     # If on, an existing log file with the
                    # same name as the new log file will be
                    # truncated rather than appended to.
                    # But such truncation only occurs on
                    # time-driven rotation, not on restarts
                    # or size-driven rotation.  Default is
                    # off, meaning append to existing files
                    # in all cases.
#log_rotation_age = 1d          # Automatic rotation of logfiles will
                    # happen after that time.  0 disables.
#log_rotation_size = 10MB       # Automatic rotation of logfiles will
                    # happen after that much log output.
                    # 0 disables.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#syslog_sequence_numbers = on
#syslog_split_messages = on

# This is only relevant when logging to eventlog (win32):
# (change requires restart)
#event_source = 'PostgreSQL'

# - When to Log -

#log_min_messages = warning     # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic

#log_min_error_statement = error    # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic (effectively off)

#log_min_duration_statement = -1    # -1 is disabled, 0 logs all statements
                    # and their durations, > 0 logs only
                    # statements running at least this number
                    # of milliseconds

#log_transaction_sample_rate = 0.0  # Fraction of transactions whose statements
                    # are logged regardless of their duration. 1.0 logs all
                    # statements from all transactions, 0.0 never logs.

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default      # terse, default, or verbose messages
#log_hostname = off
#log_line_prefix = '%m [%p] '       # special values:
                    #   %a = application name
                    #   %u = user name
                    #   %d = database name
                    #   %r = remote host and port
                    #   %h = remote host
                    #   %p = process ID
                    #   %t = timestamp without milliseconds
                    #   %m = timestamp with milliseconds
                    #   %n = timestamp with milliseconds (as a Unix epoch)
                    #   %i = command tag
                    #   %e = SQL state
                    #   %c = session ID
                    #   %l = session line number
                    #   %s = session start timestamp
                    #   %v = virtual transaction ID
                    #   %x = transaction ID (0 if none)
                    #   %q = stop here in non-session
                    #        processes
                    #   %% = '%'
                    # e.g. '<%u%%%d> '
#log_lock_waits = off           # log lock waits >= deadlock_timeout
#log_statement = 'none'         # none, ddl, mod, all
#log_replication_commands = off
#log_temp_files = -1            # log temporary files equal or larger
                    # than the specified size in kilobytes;
                    # -1 disables, 0 logs all temp files
log_timezone = 'America/New_York'

#------------------------------------------------------------------------------
# PROCESS TITLE
#------------------------------------------------------------------------------

#cluster_name = ''          # added to process titles if nonempty
                    # (change requires restart)
#update_process_title = on

#------------------------------------------------------------------------------
# STATISTICS
#------------------------------------------------------------------------------

# - Query and Index Statistics Collector -

#track_activities = on
#track_counts = on
#track_io_timing = off
#track_functions = none         # none, pl, all
#track_activity_query_size = 1024   # (change requires restart)
#stats_temp_directory = 'pg_stat_tmp'

# - Monitoring -

#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off

#------------------------------------------------------------------------------
# AUTOVACUUM
#------------------------------------------------------------------------------

#autovacuum = on            # Enable autovacuum subprocess?  'on'
                    # requires track_counts to also be on.
#log_autovacuum_min_duration = -1   # -1 disables, 0 logs all actions and
                    # their durations, > 0 logs only
                    # actions running at least this number
                    # of milliseconds.
#autovacuum_max_workers = 3     # max number of autovacuum subprocesses
                    # (change requires restart)
#autovacuum_naptime = 1min      # time between autovacuum runs
#autovacuum_vacuum_threshold = 50   # min number of row updates before
                    # vacuum
#autovacuum_analyze_threshold = 50  # min number of row updates before
                    # analyze
#autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of table size before analyze
#autovacuum_freeze_max_age = 200000000  # maximum XID age before forced vacuum
                    # (change requires restart)
#autovacuum_multixact_freeze_max_age = 400000000    # maximum multixact age
                    # before forced vacuum
                    # (change requires restart)
#autovacuum_vacuum_cost_delay = 2ms # default vacuum cost delay for
                    # autovacuum, in milliseconds;
                    # -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1  # default vacuum cost limit for
                    # autovacuum, -1 means use
                    # vacuum_cost_limit

#------------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#------------------------------------------------------------------------------

# - Statement Behavior -

#client_min_messages = notice       # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   log
                    #   notice
                    #   warning
                    #   error
#search_path = '"$user", public'    # schema names
#row_security = on
#default_tablespace = ''        # a tablespace name, '' uses the default
#temp_tablespaces = ''          # a list of tablespace names, '' uses
                    # only default tablespace
#default_table_access_method = 'heap'
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#default_transaction_deferrable = off
#session_replication_role = 'origin'
#statement_timeout = 0          # in milliseconds, 0 is disabled
#lock_timeout = 0           # in milliseconds, 0 is disabled
#idle_in_transaction_session_timeout = 0    # in milliseconds, 0 is disabled
#vacuum_freeze_min_age = 50000000
#vacuum_freeze_table_age = 150000000
#vacuum_multixact_freeze_min_age = 5000000
#vacuum_multixact_freeze_table_age = 150000000
#vacuum_cleanup_index_scale_factor = 0.1    # fraction of total number of tuples
                        # before index cleanup, 0 always performs
                        # index cleanup
#bytea_output = 'hex'           # hex, escape
#xmlbinary = 'base64'
#xmloption = 'content'
#gin_fuzzy_search_limit = 0
#gin_pending_list_limit = 4MB

# - Locale and Formatting -

datestyle = 'iso, mdy'
#intervalstyle = 'postgres'
timezone = 'America/New_York'
#timezone_abbreviations = 'Default'     # Select the set of available time zone
                    # abbreviations.  Currently, there are
                    #   Default
                    #   Australia (historical usage)
                    #   India
                    # You can create your own file in
                    # share/timezonesets/.
#extra_float_digits = 1         # min -15, max 3; any value >0 actually
                    # selects precise output mode
#client_encoding = sql_ascii        # actually, defaults to database
                    # encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'en_US.UTF-8'         # locale for system error message
                    # strings
lc_monetary = 'en_US.UTF-8'         # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'          # locale for number formatting
lc_time = 'en_US.UTF-8'             # locale for time formatting

# default configuration for text search
default_text_search_config = 'pg_catalog.english'

# - Shared Library Preloading -

#shared_preload_libraries = ''  # (change requires restart)
#local_preload_libraries = ''
#session_preload_libraries = ''
#jit_provider = 'llvmjit'       # JIT library to use

# - Other Defaults -

#dynamic_library_path = '$libdir'

#------------------------------------------------------------------------------
# LOCK MANAGEMENT
#------------------------------------------------------------------------------

#deadlock_timeout = 1s
#max_locks_per_transaction = 64     # min 10
                    # (change requires restart)
#max_pred_locks_per_transaction = 64    # min 10
                    # (change requires restart)
#max_pred_locks_per_relation = -2   # negative values mean
                    # (max_pred_locks_per_transaction
                    #  / -max_pred_locks_per_relation) - 1
#max_pred_locks_per_page = 2            # min 0

#------------------------------------------------------------------------------
# VERSION AND PLATFORM COMPATIBILITY
#------------------------------------------------------------------------------

# - Previous PostgreSQL Versions -

#array_nulls = on
#backslash_quote = safe_encoding    # on, off, or safe_encoding
#escape_string_warning = on
#lo_compat_privileges = off
#operator_precedence_warning = off
#quote_all_identifiers = off
#standard_conforming_strings = on
#synchronize_seqscans = on

# - Other Platforms and Clients -

#transform_null_equals = off

#------------------------------------------------------------------------------
# ERROR HANDLING
#------------------------------------------------------------------------------

#exit_on_error = off            # terminate session on any error?
#restart_after_crash = on       # reinitialize after backend crash?
#data_sync_retry = off          # retry or panic on failure to fsync
                    # data?
                    # (change requires restart)

#------------------------------------------------------------------------------
# CONFIG FILE INCLUDES
#------------------------------------------------------------------------------

# These options allow settings to be loaded from files other than the
# default postgresql.conf.  Note that these are directives, not variable
# assignments, so they can usefully be given more than once.

#include_dir = '...'            # include files ending in '.conf' from
                    # a directory, e.g., 'conf.d'
#include_if_exists = '...'      # include file only if it exists
#include = '...'            # include file

#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here
port = 5601
listen_addresses = '*'
shared_preload_libraries = 'repmgr,pgaudit,pg_stat_statements,pg_stat_kcache,timescaledb'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
track_commit_timestamp=on
random_page_cost = 1.1
effective_io_concurrency = 200
checkpoint_completion_target = 0.9
min_wal_size = 2GB
max_wal_size = 8GB
temp_tablespaces = pgtemp
log_statement='ddl'
logging_collector='on'
log_rotation_age = 1d
log_rotation_size = 1000MB # Automatic rotation of logfiles will
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_directory = '/pglog/pg-data-tsaccdb/pg_log' # directory where log files are written
log_checkpoints = on
log_connections = on
log_disconnections = on
log_min_duration_statement = 400
log_error_verbosity = verbose # terse, default, or verbose messages
log_hostname = on
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u '
log_lock_waits = on
log_temp_files = 0 # log temporary files equal or larger
log_timezone = 'US/Eastern'
temp_tablespaces = 'pgtemp'
archive_mode= on
#archive_command='/bin/true'
archive_command = 'pgbackrest --stanza=tsaccdb archive-push %p'
# Dynamically generated Section

max_worker_processes = 2
max_parallel_workers = 2
shared_buffers = 4GB
effective_cache_size = 12GB
max_connections = 1000
work_mem = 5242kB
maintenance_work_mem = 1GB
## Timescaledb https://docs.timescale.com/timescaledb/latest/how-to-guides/multi-node-setup/required-configuration/#multi-node-configuration
max_prepared_transactions=150
enable_partitionwise_aggregate=on
jit=off
timescaledb.telemetry_level=off

How can we reproduce the bug?

Batch job run of deletes.

>egrep "\[43788\]" postgresql-2022-02-25_000000.log
2022-02-25 04:14:34 EST [43788]: [5-1] db=cdsdb,user=cds_u LOG:  00000: duration: 1413.569 ms  parse <unnamed>: select 1
2022-02-25 04:14:34 EST [43788]: [6-1] db=cdsdb,user=cds_u LOCATION:  exec_parse_message, postgres.c:1590
2022-02-25 04:15:36 EST [43788]: [7-1] db=cdsdb,user=cds_u LOG:  00000: duration: 1400.028 ms  parse <unnamed>: select 1
2022-02-25 04:15:36 EST [43788]: [8-1] db=cdsdb,user=cds_u LOCATION:  exec_parse_message, postgres.c:1590
2022-02-25 04:15:43 EST [43788]: [9-1] db=cdsdb,user=cds_u LOG:  00000: duration: 1690.938 ms  parse <unnamed>: select 1
2022-02-25 04:15:43 EST [43788]: [10-1] db=cdsdb,user=cds_u LOCATION:  exec_parse_message, postgres.c:1590
2022-02-25 04:15:44 EST [43788]: [11-1] db=cdsdb,user=cds_u LOG:  00000: duration: 537.528 ms  bind <unnamed>: select 1
2022-02-25 04:15:44 EST [43788]: [12-1] db=cdsdb,user=cds_u LOCATION:  exec_bind_message, postgres.c:2025
2022-02-25 04:15:51 EST [43788]: [13-1] db=cdsdb,user=cds_u LOG:  00000: duration: 1531.281 ms  parse <unnamed>: select 1
2022-02-25 04:15:52 EST [43788]: [14-1] db=cdsdb,user=cds_u LOCATION:  exec_parse_message, postgres.c:1590
2022-02-25 04:16:53 EST [43788]: [15-1] db=cdsdb,user=cds_u LOG:  00000: duration: 1071.391 ms  parse <unnamed>: select 1
2022-02-25 04:16:54 EST [43788]: [16-1] db=cdsdb,user=cds_u LOCATION:  exec_parse_message, postgres.c:1590
2022-02-25 04:16:57 EST [43788]: [17-1] db=cdsdb,user=cds_u LOG:  00000: duration: 2213.855 ms  bind <unnamed>: select 1
2022-02-25 04:16:57 EST [43788]: [18-1] db=cdsdb,user=cds_u LOCATION:  exec_bind_message, postgres.c:2025
2022-02-25 04:17:09 EST [43788]: [19-1] db=cdsdb,user=cds_u LOG:  00000: duration: 2154.015 ms  parse <unnamed>: select 1
2022-02-25 04:17:09 EST [43788]: [20-1] db=cdsdb,user=cds_u LOCATION:  exec_parse_message, postgres.c:1590
2022-02-25 04:17:11 EST [43788]: [21-1] db=cdsdb,user=cds_u LOG:  00000: duration: 624.396 ms  bind <unnamed>: select 1
2022-02-25 04:17:11 EST [43788]: [22-1] db=cdsdb,user=cds_u LOCATION:  exec_bind_message, postgres.c:2025
2022-02-25 04:17:12 EST [43788]: [23-1] db=cdsdb,user=cds_u WARNING:  57P02: terminating connection because of crash of another server process
2022-02-25 04:17:12 EST [43788]: [24-1] db=cdsdb,user=cds_u DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-02-25 04:17:12 EST [43788]: [25-1] db=cdsdb,user=cds_u HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-02-25 04:17:12 EST [43788]: [26-1] db=cdsdb,user=cds_u LOCATION:  quickdie, postgres.c:2802
2022-02-25 13:52:52 EST [43788]: [1-1] db=[unknown],user=[unknown] LOG:  00000: connection received: host=hostname.prod.oclc.org port=58842
2022-02-25 13:52:52 EST [43788]: [2-1] db=[unknown],user=[unknown] LOCATION:  BackendInitialize, postmaster.c:4409
2022-02-25 13:52:52 EST [43788]: [3-1] db=repmgr,user=repmgr LOG:  00000: connection authorized: user=repmgr database=repmgr
2022-02-25 13:52:52 EST [43788]: [4-1] db=repmgr,user=repmgr LOCATION:  PerformAuthentication, postinit.c:292
2022-02-25 13:52:52 EST [43788]: [5-1] db=repmgr,user=repmgr LOG:  00000: disconnection: session time: 0:00:00.003 user=repmgr database=repmgr host=hostname.prod.oclc.org port=58842
2022-02-25 13:52:52 EST [43788]: [6-1] db=repmgr,user=repmgr LOCATION:  log_disconnections, postgres.c:4767
svenklemm commented 2 years ago

So this bug did not happen in 2.5.0 but started with 2.5.2? Did you try 2.5.1 too. Can you show the configuration of your distributed hypertable(s). What triggers this behaviour? How can this be reproduced?

tatzlwurm2 commented 2 years ago

It started in 2.5.0 and I tried 2.5.2 to see if it would fix. It appears to be a bunch of deletes that triggers this issue. The delete's look they are completing normally but yet system crashes. I haven't been able to reproduce this on my own.

 hypertable_schema | hypertable_name | owner | num_dimensions | num_chunks | compression_enabled | is_distributed | replication_factor |                  data_nodes                   | tablespaces
-------------------+-----------------+-------+----------------+------------+---------------------+----------------+--------------------+-----------------------------------------------+-------------
 cds_s             | incidentsdata   | cds_u |              2 |        469 | f                   | t              |                  2 | {cds_node_1,cds_node_2,cds_node_3,cds_node_4} |
 cds_s             | rtpmdata        | cds_u |              2 |        469 | f                   | t              |                  2 | {cds_node_1,cds_node_2,cds_node_3,cds_node_4} |
 cds_s             | statsdata       | cds_u |              2 |        743 | f                   | t              |                  2 | {cds_node_1,cds_node_2,cds_node_3,cds_node_4} |
 cds_s             | otherdata       | cds_u |              2 |         87 | f                   | t              |                  2 | {cds_node_1,cds_node_2,cds_node_3,cds_node_4} |
(4 rows)
cdsdb=# \d cds_s.statsdata
                                  Table "cds_s.statsdata"
      Column       |            Type             | Collation | Nullable |      Default
-------------------+-----------------------------+-----------+----------+-------------------
 environment_name  | character varying(64)       |           | not null |
 data_timestamp    | timestamp without time zone |           | not null |
 host_name         | character varying(64)       |           | not null |
 process_name      | character varying(32)       |           | not null |
 random_uniq_ifier | integer                     |           | not null |
 update_timestamp  | timestamp without time zone |           |          | CURRENT_TIMESTAMP
 data_value        | bytea                       |           |          |
 data_format       | text                        |           |          | 'ber'::text
Indexes:
    "statsdata_pkey" PRIMARY KEY, btree (environment_name, data_timestamp, host_name, process_name, random_uniq_ifier), tablespace "cdsx_md01"
    "statsdata_data_timestamp_idx" btree (data_timestamp DESC), tablespace "cdsx_md01"
    "statsdata_environment_name_data_timestamp_idx" btree (environment_name, data_timestamp DESC), tablespace "cdsx_md01"
Check constraints:
    "stats_data_format_check" CHECK (data_format = ANY (ARRAY['ber'::text, 'json'::text]))
Triggers:
    ts_insert_blocker BEFORE INSERT ON statsdata FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Number of child tables: 743 (Use \d+ to list them.)
Tablespace: "cdsd_md01"
tatzlwurm2 commented 2 years ago

Given this SELECT create_distributed_hypertable('statsdata', 'data_timestamp','environment_name',chunk_time_interval => INTERVAL '1 day'); The memory leak appears to be on these types of statements:

2022-02-28 01:11:32 EST [2094]: [1208-1] db=cdsdb,user=cds_u LOG:  00000: duration: 417.064 ms  bind S_2: delete from incidentsdata where environment_name=$1 and data_timestamp >= $2 and data_timestamp <= $3
2022-02-28 01:11:32 EST [2094]: [1209-1] db=cdsdb,user=cds_u DETAIL:  parameters: $1 = 'XXX_prod_audit-statistics-webapp', $2 = '1969-12-31 19:00:00', $3 = '2021-11-29 23:59:59.999'
2022-02-28 01:11:32 EST [2094]: [1210-1] db=cdsdb,user=cds_u LOCATION:  exec_bind_message, postgres.c:2025
2022-02-28 01:11:34 EST [2094]: [1211-1] db=cdsdb,user=cds_u LOG:  00000: duration: 900.280 ms  bind S_3: delete from statsdata where environment_name=$1 and data_timestamp >= $2 and data_timestamp <= $3
2022-02-28 01:11:34 EST [2094]: [1212-1] db=cdsdb,user=cds_u DETAIL:  parameters: $1 = 'XXX_prod_audit-statistics-webapp', $2 = '1969-12-31 19:00:00', $3 = '2021-11-29 23:59:59.999'
....
2022-02-28 04:19:06 EST [112706]: [240-1] db=,user= LOG:  00000: server process (PID 2094) was terminated by signal 9: Killed

From OS.

[Mon Feb 28 04:13:51 2022] Killed process 2094 (postgres), UID 57887, total-vm:19299836kB, anon-rss:14388556kB, file-rss:0kB, shmem-rss:17200kB

The response times look fine but it appears to be cause of the leak. It doesn't make sense that there is a big time gap between last delete to OOM killing that process. Is there a better way to issue deletes based on the above?

svenklemm commented 2 years ago

So versions before 2.5.0 did not show this behaviour? Which version did you use before 2.5.0? There is indeed a better way to delete data in hypertables with drop_chunks but it has some constraints so you might have to fall back on normal DELETE depending on your situation. https://docs.timescale.com/api/latest/hypertable/drop_chunks/

tatzlwurm2 commented 2 years ago

2.5.0 is the version we started with as we are new to timescaledb. I did look into drop earlier today, but Need to delete based on one of the demensions. IE table's definition is like this: SELECT create_distributed_hypertable('statsdata', 'data_timestamp','environment_name',chunk_time_interval => INTERVAL '1 day'); So the delete that cause the issue is this 2022-03-01 04:19:17 EST [112706]: [258-1] db=,user= DETAIL: Failed process was running: delete from statsdata where environment_name=$1 and data_timestamp >= $2 and data_timestamp <= $3 So if we could set the policy on evironment_name and timestamp then the memory issue most likely won't be there.

tatzlwurm2 commented 2 years ago

If it helps the leak appears to be related to size of the hyper distributed table. I had to purge some data where the leak was occurring, due to corruption last Friday. The leak hasn't occurred since in that environment. Also I haven't observed that behavior in any of our smaller environments either. Only in large sets does it occur.

tatzlwurm2 commented 2 years ago

Any updates? Still happening : [Thu Feb 10 05:07:07 2022] Killed process 19759 (postgres), UID 57887, total-vm:19661448kB, anon-rss:14380492kB, file-rss:0kB, shmem-rss:17456kB [Fri Feb 11 04:16:54 2022] Killed process 92706 (postgres), UID 57887, total-vm:19726820kB, anon-rss:14376796kB, file-rss:0kB, shmem-rss:14560kB [Sat Feb 12 04:16:39 2022] Killed process 62885 (postgres), UID 57887, total-vm:19850512kB, anon-rss:14514244kB, file-rss:0kB, shmem-rss:14740kB [Sun Feb 13 04:17:01 2022] Killed process 21477 (postgres), UID 57887, total-vm:19838228kB, anon-rss:14529932kB, file-rss:0kB, shmem-rss:14380kB [Mon Feb 14 04:26:28 2022] Killed process 130575 (postgres), UID 57887, total-vm:20147584kB, anon-rss:14831064kB, file-rss:0kB, shmem-rss:15000kB [Tue Feb 15 04:16:27 2022] Killed process 95862 (postgres), UID 57887, total-vm:19567520kB, anon-rss:14220152kB, file-rss:0kB, shmem-rss:18860kB [Wed Feb 16 04:26:08 2022] Killed process 58079 (postgres), UID 57887, total-vm:20150216kB, anon-rss:14829152kB, file-rss:0kB, shmem-rss:25160kB [Thu Feb 17 04:20:20 2022] Killed process 20071 (postgres), UID 57887, total-vm:19717384kB, anon-rss:14373256kB, file-rss:0kB, shmem-rss:24752kB [Fri Feb 18 04:16:22 2022] Killed process 106579 (postgres), UID 57887, total-vm:19388164kB, anon-rss:14055676kB, file-rss:0kB, shmem-rss:18712kB [Fri Feb 18 19:16:12 2022] Killed process 69438 (postgres), UID 57887, total-vm:19529264kB, anon-rss:14243780kB, file-rss:24kB, shmem-rss:14332kB [Sat Feb 19 04:16:06 2022] Killed process 75229 (postgres), UID 57887, total-vm:19361956kB, anon-rss:13983328kB, file-rss:0kB, shmem-rss:21944kB [Sun Feb 20 04:16:07 2022] Killed process 29523 (postgres), UID 57887, total-vm:19531512kB, anon-rss:14259448kB, file-rss:0kB, shmem-rss:20396kB [Mon Feb 21 04:25:46 2022] Killed process 837 (postgres), UID 57887, total-vm:20051852kB, anon-rss:14740816kB, file-rss:0kB, shmem-rss:25692kB [Tue Feb 22 04:15:49 2022] Killed process 86831 (postgres), UID 57887, total-vm:19338472kB, anon-rss:14277928kB, file-rss:0kB, shmem-rss:21348kB [Wed Feb 23 04:15:41 2022] Killed process 57688 (postgres), UID 57887, total-vm:19421312kB, anon-rss:14094148kB, file-rss:24kB, shmem-rss:19220kB [Thu Feb 24 04:19:04 2022] Killed process 19800 (postgres), UID 57887, total-vm:19506544kB, anon-rss:14420172kB, file-rss:0kB, shmem-rss:5172kB [Fri Feb 25 04:12:16 2022] Killed process 110640 (postgres), UID 57887, total-vm:19338936kB, anon-rss:14268400kB, file-rss:0kB, shmem-rss:10268kB [Sat Feb 26 04:15:14 2022] Killed process 78107 (postgres), UID 57887, total-vm:19540912kB, anon-rss:14439564kB, file-rss:0kB, shmem-rss:15644kB [Sun Feb 27 04:11:12 2022] Killed process 39510 (postgres), UID 57887, total-vm:19151292kB, anon-rss:14152452kB, file-rss:0kB, shmem-rss:5084kB [Mon Feb 28 04:13:51 2022] Killed process 2094 (postgres), UID 57887, total-vm:19299836kB, anon-rss:14388556kB, file-rss:0kB, shmem-rss:17200kB [Tue Mar 1 04:13:56 2022] Killed process 92115 (postgres), UID 57887, total-vm:19144768kB, anon-rss:14331376kB, file-rss:0kB, shmem-rss:14368kB [Wed Mar 2 04:12:36 2022] Killed process 58847 (postgres), UID 57887, total-vm:19529328kB, anon-rss:14842608kB, file-rss:0kB, shmem-rss:13052kB [Thu Mar 3 04:16:31 2022] Killed process 27628 (postgres), UID 57887, total-vm:19712380kB, anon-rss:14869248kB, file-rss:0kB, shmem-rss:9924kB [Fri Mar 4 04:16:25 2022] Killed process 127062 (postgres), UID 57887, total-vm:19813068kB, anon-rss:14882476kB, file-rss:0kB, shmem-rss:7396kB

erimatnor commented 2 years ago

Potentially related to: https://github.com/timescale/timescaledb/issues/3862

akuzm commented 2 years ago

Note for maintainers: The distributed tables do leak some memory into ExecutorState context, so it's going to grow until the query ends. See fdw_exec_foreign_update_or_delete -> async_request_set_wait_any_result, async_request_send_prepared_stmt_with_params. Arguably they could allocate this memory on the per-tuple context, or free it. I guess it's a bug.

fabriziomello commented 9 months ago

In 2.13.0 we announced the deprecation of multi-node and it will be complete removed from the upcoming 2.14.0.