matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.79k stars 2.13k forks source link

Synapse makes postgres to eat all ram until OOM #9182

Closed olmari closed 3 years ago

olmari commented 3 years ago

Description

Synapse makes related postgres processes to slowly eat all ram (16g) and swap until OOM hits.

Steps to reproduce

For this I do'nt have anything to really tell, will provide tried things later on.

Version information

When postgres and synapse is started, all synapse related postgres threads has RES quite exactly 250mb, after 12h or so their RES is approaching 1200mb each, median 1100. Obviously I have tried all the normal postgres tunables but nothing really makes perceivable difference on this behaviour. Restarting either synapse or postgres process will make memory go back to normal RES, just to climb away to OOM again

I have tried to dig what could possibly cause this behaviour, but at this point I an out of ideas. On comparable but maybe doubly bigger HS kapsi.fi that does have redis workers and whatnot enabled, does not experience this behaviour, postgres process is well happy with memory pretty much it has at startup.

Here is postgres=# select datname, usename, pid, backend_start, xact_start, query_start, state_change, wait_event_type, wait_event, state, query from pg_stat_activity;

   datname   |   usename    |  pid   |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type |     wait_event      | state  |                                                                                       query                                                                                       
-------------+--------------+--------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
             | postgres     | 172769 | 2021-01-20 10:02:07.119053+02 |                               |                               |                               | Activity        | LogicalLauncherMain |        | 
 synapse     | synapse_user | 172776 | 2021-01-20 10:02:08.283794+02 |                               | 2021-01-20 19:31:02.698945+02 | 2021-01-20 19:31:02.698961+02 | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172774 | 2021-01-20 10:02:08.277446+02 |                               | 2021-01-20 19:31:02.693968+02 | 2021-01-20 19:31:02.693981+02 | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172775 | 2021-01-20 10:02:08.277738+02 |                               | 2021-01-20 19:31:02.703326+02 | 2021-01-20 19:31:02.70335+02  | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172778 | 2021-01-20 10:02:08.28457+02  |                               | 2021-01-20 19:31:02.695861+02 | 2021-01-20 19:31:02.695879+02 | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172777 | 2021-01-20 10:02:08.284047+02 |                               | 2021-01-20 19:31:02.697951+02 | 2021-01-20 19:31:02.697974+02 | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172779 | 2021-01-20 10:02:08.303174+02 |                               | 2021-01-20 19:31:02.691738+02 | 2021-01-20 19:31:02.691757+02 | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172780 | 2021-01-20 10:02:08.313032+02 |                               | 2021-01-20 19:31:02.692357+02 | 2021-01-20 19:31:02.692368+02 | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172781 | 2021-01-20 10:02:08.313392+02 |                               | 2021-01-20 19:31:02.691576+02 | 2021-01-20 19:31:02.691586+02 | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172782 | 2021-01-20 10:02:08.320273+02 |                               | 2021-01-20 19:31:02.690884+02 | 2021-01-20 19:31:02.690911+02 | Client          | ClientRead          | idle   | COMMIT
 synapse     | synapse_user | 172783 | 2021-01-20 10:02:08.321661+02 |                               | 2021-01-20 19:31:02.693378+02 | 2021-01-20 19:31:02.693389+02 | Client          | ClientRead          | idle   | COMMIT
 telegrambot | telegrambot  | 182339 | 2021-01-20 19:05:41.200507+02 |                               | 2021-01-20 19:30:59.350267+02 | 2021-01-20 19:30:59.350449+02 | Client          | ClientRead          | idle   | SELECT pg_advisory_unlock_all();                                                                                                                                                 +
             |              |        |                               |                               |                               |                               |                 |                     |        | CLOSE ALL;                                                                                                                                                                       +
             |              |        |                               |                               |                               |                               |                 |                     |        |                                                                                                                                                                                  +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 DO $$                                                                                                                                                            +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 BEGIN                                                                                                                                                            +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     PERFORM * FROM pg_listening_channels() LIMIT 1;                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     IF FOUND THEN                                                                                                                                                +
             |              |        |                               |                               |                               |                               |                 |                     |        |                         UNLISTEN *;                                                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     END IF;                                                                                                                                                      +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 END;                                                                                                                                                             +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 $$;                                                                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                                                                                                                                                                                  +
             |              |        |                               |                               |                               |                               |                 |                     |        | RESET ALL;
 postgres    | postgres     | 182778 | 2021-01-20 19:30:59.687796+02 | 2021-01-20 19:31:02.683108+02 | 2021-01-20 19:31:02.683108+02 | 2021-01-20 19:31:02.683109+02 |                 |                     | active | select datname, usename, pid, backend_start, xact_start, query_start, state_change, wait_event_type, wait_event, state, query from pg_stat_activity;
 facebookbot | facebookbot  | 172786 | 2021-01-20 10:02:08.41789+02  |                               | 2021-01-20 19:30:32.971077+02 | 2021-01-20 19:30:32.971258+02 | Client          | ClientRead          | idle   | SELECT pg_advisory_unlock_all();                                                                                                                                                 +
             |              |        |                               |                               |                               |                               |                 |                     |        | CLOSE ALL;                                                                                                                                                                       +
             |              |        |                               |                               |                               |                               |                 |                     |        |                                                                                                                                                                                  +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 DO $$                                                                                                                                                            +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 BEGIN                                                                                                                                                            +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     PERFORM * FROM pg_listening_channels() LIMIT 1;                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     IF FOUND THEN                                                                                                                                                +
             |              |        |                               |                               |                               |                               |                 |                     |        |                         UNLISTEN *;                                                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     END IF;                                                                                                                                                      +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 END;                                                                                                                                                             +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 $$;                                                                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                                                                                                                                                                                  +
             |              |        |                               |                               |                               |                               |                 |                     |        | RESET ALL;
 telegrambot | telegrambot  | 172794 | 2021-01-20 10:02:09.385851+02 |                               | 2021-01-20 19:28:32.072588+02 | 2021-01-20 19:28:32.074156+02 | Client          | ClientRead          | idle   | COMMIT
 grafana     | grafana      | 181467 | 2021-01-20 18:18:29.98819+02  |                               | 2021-01-20 19:31:00.981616+02 | 2021-01-20 19:31:00.981806+02 | Client          | ClientRead          | idle   | select * from alert
 telegrambot | telegrambot  | 172802 | 2021-01-20 10:02:20.980309+02 |                               | 2021-01-20 19:28:06.589256+02 | 2021-01-20 19:28:06.589273+02 | Client          | ClientRead          | idle   | COMMIT
 telegrambot | telegrambot  | 172803 | 2021-01-20 10:02:20.997652+02 |                               | 2021-01-20 19:28:32.168638+02 | 2021-01-20 19:28:32.170706+02 | Client          | ClientRead          | idle   | COMMIT
 telegrambot | telegrambot  | 172804 | 2021-01-20 10:02:21.01352+02  |                               | 2021-01-20 19:28:32.171649+02 | 2021-01-20 19:28:32.171689+02 | Client          | ClientRead          | idle   | COMMIT
 telegrambot | telegrambot  | 172805 | 2021-01-20 10:02:21.023916+02 |                               | 2021-01-20 19:28:32.076235+02 | 2021-01-20 19:28:32.076275+02 | Client          | ClientRead          | idle   | ROLLBACK
 signalbot   | signalbot    | 172813 | 2021-01-20 10:02:32.943974+02 |                               | 2021-01-20 19:30:45.81808+02  | 2021-01-20 19:30:45.81825+02  | Client          | ClientRead          | idle   | SELECT pg_advisory_unlock_all();                                                                                                                                                 +
             |              |        |                               |                               |                               |                               |                 |                     |        | CLOSE ALL;                                                                                                                                                                       +
             |              |        |                               |                               |                               |                               |                 |                     |        |                                                                                                                                                                                  +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 DO $$                                                                                                                                                            +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 BEGIN                                                                                                                                                            +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     PERFORM * FROM pg_listening_channels() LIMIT 1;                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     IF FOUND THEN                                                                                                                                                +
             |              |        |                               |                               |                               |                               |                 |                     |        |                         UNLISTEN *;                                                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                     END IF;                                                                                                                                                      +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 END;                                                                                                                                                             +
             |              |        |                               |                               |                               |                               |                 |                     |        |                 $$;                                                                                                                                                              +
             |              |        |                               |                               |                               |                               |                 |                     |        |                                                                                                                                                                                  +
             |              |        |                               |                               |                               |                               |                 |                     |        | RESET ALL;
 facebookbot | facebookbot  | 172866 | 2021-01-20 10:05:53.116227+02 |                               | 2021-01-20 19:08:07.541813+02 | 2021-01-20 19:08:07.541822+02 | Client          | ClientRead          | idle   | ROLLBACK
 grafana     | grafana      | 181810 | 2021-01-20 18:38:29.988601+02 |                               | 2021-01-20 19:30:50.981635+02 | 2021-01-20 19:30:50.981968+02 | Client          | ClientRead          | idle   | select * from alert
 whatsappbot | whatsappbot  | 182449 | 2021-01-20 19:13:13.567375+02 |                               | 2021-01-20 19:30:52.996283+02 | 2021-01-20 19:30:52.997808+02 | Client          | ClientRead          | idle   | UPDATE puppet SET displayname=$1, name_quality=$2, avatar=$3, avatar_url=$4, custom_mxid=$5, access_token=$6, next_batch=$7, enable_presence=$8, enable_receipts=$9 WHERE jid=$10
 whatsappbot | whatsappbot  | 182441 | 2021-01-20 19:13:13.551931+02 |                               | 2021-01-20 19:30:57.959069+02 | 2021-01-20 19:30:57.960742+02 | Client          | ClientRead          | idle   | UPDATE puppet SET displayname=$1, name_quality=$2, avatar=$3, avatar_url=$4, custom_mxid=$5, access_token=$6, next_batch=$7, enable_presence=$8, enable_receipts=$9 WHERE jid=$10
             |              | 172766 | 2021-01-20 10:02:07.118342+02 |                               |                               |                               | Activity        | BgWriterMain        |        | 
             |              | 172765 | 2021-01-20 10:02:07.118096+02 |                               |                               |                               | Activity        | CheckpointerMain    |        | 
             |              | 172767 | 2021-01-20 10:02:07.118604+02 |                               |                               |                               | Activity        | WalWriterMain       |        | 
(28 rows)

I've also tried to run (gdb) p MemoryContextStats(TopMemoryContext) onto one such process, here are results from startup and then when ram almost eaten:

TopMemoryContext: 154592 total in 8 blocks; 47496 free (98 chunks); 107096 used
  pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1448 free (0 chunks); 6744 used
  HandleParallelMessages: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  RI compare cache: 16384 total in 2 blocks; 6656 free (3 chunks); 9728 used
  RI query cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  RI constraint cache: 40888 total in 2 blocks; 2616 free (0 chunks); 38272 used
  Sequence values: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  Btree proof lookup cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  CFuncHash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  Tsearch dictionary cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  Tsearch parser cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  Tsearch configuration cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  TableSpace cache: 8192 total in 1 blocks; 2088 free (0 chunks); 6104 used
  Type information cache: 24616 total in 2 blocks; 2616 free (0 chunks); 22000 used
  Operator lookup cache: 24576 total in 2 blocks; 10752 free (3 chunks); 13824 used
  RowDescriptionContext: 8192 total in 1 blocks; 6888 free (0 chunks); 1304 used
  MessageContext: 8192 total in 1 blocks; 6888 free (1 chunks); 1304 used
  Operator class cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  smgr relation table: 131072 total in 5 blocks; 73928 free (19 chunks); 57144 used
  TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
  Portal hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  TopPortalContext: 8192 total in 1 blocks; 7928 free (1 chunks); 264 used
  Relcache by OID: 32768 total in 3 blocks; 10488 free (6 chunks); 22280 used
  CacheMemoryContext: 4689520 total in 32 blocks; 1746360 free (0 chunks); 2943160 used
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: room_retention_max_lifetime_idx
    index info: 2048 total in 2 blocks; 608 free (1 chunks); 1440 used: room_retention_pkey
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_toast_17486_index
    index info: 3072 total in 2 blocks; 1080 free (1 chunks); 1992 used: room_tag_uniqueness
    CachedPlan: 8192 total in 4 blocks; 3464 free (0 chunks); 4728 used: SELECT 1 FROM ONLY "public"."access_tokens" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF ...
    CachedPlan: 8192 total in 4 blocks; 2864 free (0 chunks); 5328 used: SELECT 1 FROM ONLY "public"."events" x WHERE "event_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
    index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: event_push_summary_stream_ordering_lock_key
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: ex_outlier_stream_pkey
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: group_attestations_renewals_v_idx
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: group_attestations_renewals_u_idx
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: group_attestations_renewals_g_idx
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: monthly_active_users_users
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: monthly_active_users_time_stamp
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_signature_stream_idx
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: ui_auth_sessions_session_id_key
    index info: 3072 total in 2 blocks; 664 free (1 chunks); 2408 used: e2e_one_time_keys_json_uniqueness
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: e2e_room_keys_versions_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: access_tokens_device_id
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: access_tokens_token_key
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: access_tokens_pkey
    CachedPlanSource: 4096 total in 3 blocks; 1416 free (0 chunks); 2680 used: SELECT 1 FROM ONLY "public"."access_tokens" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF ...
      CachedPlanQuery: 4096 total in 3 blocks; 1192 free (1 chunks); 2904 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlanSource: 4096 total in 3 blocks; 1416 free (0 chunks); 2680 used: SELECT 1 FROM ONLY "public"."events" x WHERE "event_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
      CachedPlanQuery: 4096 total in 3 blocks; 592 free (0 chunks); 3504 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_toast_17607_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: state_events_event_id_key
    index info: 2048 total in 2 blocks; 840 free (0 chunks); 1208 used: evauth_edges_id
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: local_current_membership_room_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: local_current_membership_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: pg_toast_17742_index
    TS dictionary: 1024 total in 1 blocks; 688 free (0 chunks); 336 used: simple
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: room_stats_state_room
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: device_lists_remote_cache_unique_id
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: user_daily_visits_uts_idx
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: user_daily_visits_ts_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: state_groups_room_id_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: state_groups_pkey
    index info: 3072 total in 2 blocks; 968 free (1 chunks); 2104 used: users_who_share_private_rooms_u_idx
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: users_who_share_private_rooms_r_idx
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: users_who_share_private_rooms_o_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: users_in_public_rooms_u_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: users_in_public_rooms_r_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_directory_search_user_idx
    index info: 4096 total in 3 blocks; 2256 free (2 chunks); 1840 used: user_directory_search_fts_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_directory_user_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_directory_room_idx
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: erased_users_user
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: group_attestations_remote_v_idx
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: group_attestations_remote_u_idx
    index info: 2048 total in 2 blocks; 608 free (1 chunks); 1440 used: group_attestations_remote_g_idx
    index info: 2048 total in 2 blocks; 656 free (1 chunks); 1392 used: group_roles_group_id_role_id_key
    index info: 3072 total in 2 blocks; 1048 free (1 chunks); 2024 used: group_summary_roles_group_id_role_id_role_order_key
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: group_summary_users_g_idx
    index info: 2048 total in 2 blocks; 656 free (1 chunks); 1392 used: group_room_categories_group_id_category_id_key
    index info: 3072 total in 2 blocks; 1048 free (1 chunks); 2024 used: group_summary_room_categories_group_id_category_id_cat_orde_key
    index info: 3072 total in 2 blocks; 1080 free (1 chunks); 1992 used: group_summary_rooms_g_idx
    index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used: group_summary_rooms_group_id_category_id_room_id_room_order_key
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: group_rooms_r_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: group_rooms_g_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_stats_historical_end_ts
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: user_stats_historical_pkey
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_stats_current_pkey
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: current_state_delta_stream_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: device_federation_outbox_id
    index info: 2048 total in 2 blocks; 496 free (1 chunks); 1552 used: device_federation_outbox_destination_id
    index info: 2048 total in 2 blocks; 496 free (1 chunks); 1552 used: device_lists_outbound_pokes_user
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: device_lists_outbound_pokes_stream
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: device_lists_outbound_pokes_id
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: cache_invalidation_stream_by_instance_id
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: e2e_cross_signing_keys_stream_idx
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: e2e_cross_signing_keys_idx
    index info: 2048 total in 2 blocks; 920 free (0 chunks); 1128 used: groups_idx
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: event_relations_relates
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: event_relations_id
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: local_group_membership_u_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: local_group_membership_g_idx
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: server_keys_json_uniqueness
    index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: private_user_data_max_stream_id_lock_key
    index info: 3072 total in 2 blocks; 696 free (1 chunks); 2376 used: event_txn_id_txn_id
    index info: 2048 total in 2 blocks; 840 free (0 chunks); 1208 used: event_txn_id_ts
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: event_txn_id_event_id
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: stream_positions_idx
    index info: 3072 total in 2 blocks; 696 free (1 chunks); 2376 used: e2e_room_keys_with_version_idx
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: device_inbox_user_stream_id
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: device_inbox_stream_id_user_id
    index info: 2048 total in 2 blocks; 608 free (1 chunks); 1440 used: room_tag_revisions_uniqueness
    index info: 4096 total in 3 blocks; 2184 free (2 chunks); 1912 used: room_memberships_user_room_forgotten
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: room_memberships_user_id
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: room_memberships_room_id
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: room_memberships_event_id_key
    index info: 3544 total in 3 blocks; 432 free (0 chunks); 3112 used: remote_media_repository_thumbn_media_origin_id_width_height_met
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: public_room_index
    index info: 2048 total in 2 blocks; 840 free (0 chunks); 1208 used: rooms_pkey
    index info: 2048 total in 2 blocks; 840 free (0 chunks); 1208 used: pushers2_pkey
    index info: 3072 total in 2 blocks; 808 free (1 chunks); 2264 used: pushers2_app_id_pushkey_user_name_key
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: application_services_txns_id
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: application_services_txns_as_id_txn_id_key
    index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: user_directory_stream_pos_lock_key
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: room_account_data_stream_id
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: room_account_data_uniqueness
    187 more child contexts containing 402664 total in 360 blocks; 126120 free (131 chunks); 276544 used
  WAL record construction: 49768 total in 2 blocks; 6360 free (0 chunks); 43408 used
  PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  MdSmgr: 16384 total in 2 blocks; 3944 free (3 chunks); 12440 used
  LOCALLOCK hash: 32768 total in 3 blocks; 16824 free (8 chunks); 15944 used
  Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
  ErrorContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
Grand total: 6137360 bytes in 653 blocks; 2226664 free (318 chunks); 3910696 used
TopMemoryContext: 154592 total in 8 blocks; 47496 free (98 chunks); 107096 used
  pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1448 free (0 chunks); 6744 used
  HandleParallelMessages: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  RI compare cache: 16384 total in 2 blocks; 6656 free (3 chunks); 9728 used
  RI query cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  RI constraint cache: 40888 total in 2 blocks; 2616 free (0 chunks); 38272 used
  Sequence values: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  Btree proof lookup cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  CFuncHash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  Tsearch dictionary cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  Tsearch parser cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  Tsearch configuration cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  TableSpace cache: 8192 total in 1 blocks; 2088 free (0 chunks); 6104 used
  Type information cache: 24616 total in 2 blocks; 2616 free (0 chunks); 22000 used
  Operator lookup cache: 24576 total in 2 blocks; 10752 free (3 chunks); 13824 used
  RowDescriptionContext: 8192 total in 1 blocks; 6888 free (0 chunks); 1304 used
  MessageContext: 8192 total in 1 blocks; 6888 free (1 chunks); 1304 used
  Operator class cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  smgr relation table: 131072 total in 5 blocks; 73928 free (19 chunks); 57144 used
  TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
  Portal hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  TopPortalContext: 8192 total in 1 blocks; 7928 free (1 chunks); 264 used
  Relcache by OID: 32768 total in 3 blocks; 9448 free (6 chunks); 23320 used
  CacheMemoryContext: 4722032 total in 34 blocks; 1659648 free (1 chunks); 3062384 used
    index info: 3072 total in 2 blocks; 1048 free (1 chunks); 2024 used: e2e_fallback_keys_json_uniqueness
    index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: threepid_validation_token_session_id
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: threepid_validation_token_pkey
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: device_federation_inbox_sender_id
    index info: 2048 total in 2 blocks; 920 free (0 chunks); 1128 used: room_aliases_id
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: room_aliases_room_alias_key
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: blocked_rooms_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: pg_toast_17662_index
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: room_retention_max_lifetime_idx
    index info: 2048 total in 2 blocks; 608 free (1 chunks); 1440 used: room_retention_pkey
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_toast_17486_index
    index info: 3072 total in 2 blocks; 1080 free (1 chunks); 1992 used: room_tag_uniqueness
    CachedPlan: 8192 total in 4 blocks; 3464 free (0 chunks); 4728 used: SELECT 1 FROM ONLY "public"."access_tokens" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF ...
    CachedPlan: 8192 total in 4 blocks; 2864 free (0 chunks); 5328 used: SELECT 1 FROM ONLY "public"."events" x WHERE "event_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
    index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: event_push_summary_stream_ordering_lock_key
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: ex_outlier_stream_pkey
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: group_attestations_renewals_v_idx
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: group_attestations_renewals_u_idx
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: group_attestations_renewals_g_idx
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: monthly_active_users_users
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: monthly_active_users_time_stamp
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_signature_stream_idx
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: ui_auth_sessions_session_id_key
    index info: 3072 total in 2 blocks; 664 free (1 chunks); 2408 used: e2e_one_time_keys_json_uniqueness
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: e2e_room_keys_versions_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: access_tokens_device_id
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: access_tokens_token_key
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: access_tokens_pkey
    CachedPlanSource: 4096 total in 3 blocks; 1416 free (0 chunks); 2680 used: SELECT 1 FROM ONLY "public"."access_tokens" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF ...
      CachedPlanQuery: 4096 total in 3 blocks; 1192 free (1 chunks); 2904 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlanSource: 4096 total in 3 blocks; 1416 free (0 chunks); 2680 used: SELECT 1 FROM ONLY "public"."events" x WHERE "event_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
      CachedPlanQuery: 4096 total in 3 blocks; 592 free (0 chunks); 3504 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: pg_toast_17607_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: state_events_event_id_key
    index info: 2048 total in 2 blocks; 840 free (0 chunks); 1208 used: evauth_edges_id
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: local_current_membership_room_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: local_current_membership_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: pg_toast_17742_index
    TS dictionary: 1024 total in 1 blocks; 688 free (0 chunks); 336 used: simple
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: room_stats_state_room
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: device_lists_remote_cache_unique_id
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: user_daily_visits_uts_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_daily_visits_ts_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: state_groups_room_id_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: state_groups_pkey
    index info: 3072 total in 2 blocks; 968 free (1 chunks); 2104 used: users_who_share_private_rooms_u_idx
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: users_who_share_private_rooms_r_idx
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: users_who_share_private_rooms_o_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: users_in_public_rooms_u_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: users_in_public_rooms_r_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_directory_search_user_idx
    index info: 4096 total in 3 blocks; 2256 free (2 chunks); 1840 used: user_directory_search_fts_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_directory_user_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_directory_room_idx
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: erased_users_user
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: group_attestations_remote_v_idx
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: group_attestations_remote_u_idx
    index info: 2048 total in 2 blocks; 608 free (1 chunks); 1440 used: group_attestations_remote_g_idx
    index info: 2048 total in 2 blocks; 656 free (1 chunks); 1392 used: group_roles_group_id_role_id_key
    index info: 3072 total in 2 blocks; 1048 free (1 chunks); 2024 used: group_summary_roles_group_id_role_id_role_order_key
    index info: 2048 total in 2 blocks; 904 free (0 chunks); 1144 used: group_summary_users_g_idx
    index info: 2048 total in 2 blocks; 656 free (1 chunks); 1392 used: group_room_categories_group_id_category_id_key
    index info: 3072 total in 2 blocks; 1048 free (1 chunks); 2024 used: group_summary_room_categories_group_id_category_id_cat_orde_key
    index info: 3072 total in 2 blocks; 1080 free (1 chunks); 1992 used: group_summary_rooms_g_idx
    index info: 3072 total in 2 blocks; 1064 free (1 chunks); 2008 used: group_summary_rooms_group_id_category_id_room_id_room_order_key
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: group_rooms_r_idx
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: group_rooms_g_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_stats_historical_end_ts
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: user_stats_historical_pkey
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: user_stats_current_pkey
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: current_state_delta_stream_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: device_federation_outbox_id
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: device_federation_outbox_destination_id
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: device_lists_outbound_pokes_user
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: device_lists_outbound_pokes_stream
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: device_lists_outbound_pokes_id
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: cache_invalidation_stream_by_instance_id
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: e2e_cross_signing_keys_stream_idx
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: e2e_cross_signing_keys_idx
    index info: 2048 total in 2 blocks; 920 free (0 chunks); 1128 used: groups_idx
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: event_relations_relates
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: event_relations_id
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: local_group_membership_u_idx
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: local_group_membership_g_idx
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: server_keys_json_uniqueness
    index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: private_user_data_max_stream_id_lock_key
    index info: 3072 total in 2 blocks; 696 free (1 chunks); 2376 used: event_txn_id_txn_id
    index info: 2048 total in 2 blocks; 840 free (0 chunks); 1208 used: event_txn_id_ts
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: event_txn_id_event_id
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: stream_positions_idx
    index info: 3072 total in 2 blocks; 696 free (1 chunks); 2376 used: e2e_room_keys_with_version_idx
    index info: 3072 total in 2 blocks; 840 free (1 chunks); 2232 used: device_inbox_user_stream_id
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: device_inbox_stream_id_user_id
    index info: 2048 total in 2 blocks; 608 free (1 chunks); 1440 used: room_tag_revisions_uniqueness
    index info: 4096 total in 3 blocks; 2184 free (2 chunks); 1912 used: room_memberships_user_room_forgotten
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: room_memberships_user_id
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: room_memberships_room_id
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: room_memberships_event_id_key
    index info: 3544 total in 3 blocks; 432 free (0 chunks); 3112 used: remote_media_repository_thumbn_media_origin_id_width_height_met
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: public_room_index
    195 more child contexts containing 421096 total in 376 blocks; 131848 free (135 chunks); 289248 used
  WAL record construction: 49768 total in 2 blocks; 6360 free (0 chunks); 43408 used
  PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  MdSmgr: 16384 total in 2 blocks; 3032 free (6 chunks); 13352 used
  LOCALLOCK hash: 32768 total in 3 blocks; 16824 free (8 chunks); 15944 used
  Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
  ErrorContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
Grand total: 6187280 bytes in 671 blocks; 2143936 free (325 chunks); 4043344 used

So, something in our instance of synapse makes it query sometihng from postgres that will never get released or accumulates over time, but I can't figure out what. Debugging help wanted, if there is no known things to do.

ADD: While rootissue is exactly synapse postgres growing, we also deploy following bridges on same server, should they bear anything in this equation:

mautrix-telegram mautrix-whatsapp mautrix-facebook mautrix-signal SmsMatrix bot appservice-slack appservice-discord

Bubu commented 3 years ago

I've seen similar things on my homeserver before switching my db over to much faster storage and I believe @jcgruenhage is also currently encountering this.

olmari commented 3 years ago

Storage is Upcloud MaxIOPS, so it shouldn't be exactly sluggish, especially at this lever of synapse, still relateively small synapse instance (few 10's of users and bunch of bridges).

jcgruenhage commented 3 years ago

Yeah, no, it's definitely not storage related in my case either. My postgres DB is on an NVMe SSD, so IOPS shouldn't be a problem.

I've first observed this issue when migrating from Fedora (pg in docker) to Void (pg still in docker) on the host machine. Hardware didn't change, but the SSD switched from being mounted directly to having a LUKS encryption layer in between.

Worth noting: all RAM means 32GB of RAM and 16GB of SWAP in my case. All other services stay under 8GB (total, not each)

voice06 commented 3 years ago

Throwing my voice into the ring, since upgrading to 1.25.0 I've had 5 instances of out-of-memory killer going after Synapse. Not exactly sure how to go about debugging it on my end. Its a low spec VPS, 2 vCPU and 4GB vRAM but I didn't have a high frequency of oom kills prior to 1.25.0, at least not enough to require quickly writing up a monitoring script to detect synapse being down and restarting it if it is.

olmari commented 3 years ago

I'm not even sure when this exactly begun, it has been quite long time now and it is not getting better on its own :7

olmari commented 3 years ago

Added all used bridges to issue post, just to list everything possible, while rootisue still is synapses postgre threads growing. :)

intelfx commented 3 years ago

I've recently seen this on my homeserver (intelfx.name), yet to reproduce.

SleepingCrows commented 3 years ago

Storage is Upcloud MaxIOPS, so it shouldn't be exactly sluggish, especially at this lever of synapse, still relateively small synapse instance (few 10's of users and bunch of bridges).

Upcloud has some of the best storage performance in the business (even better than AWS EC2's solution), it really shouldn't be sluggish.

And yeah, synapse 1.25 (and even prior) absolutely guzzles down working memory. with the more active rooms (both local and remotely accessed) and users on the server.

olmari commented 3 years ago

@kaijuagenda This problem is not synapse eating ram itself, but synapse making postgre to eat all ram.

erikjohnston commented 3 years ago

A couple of things here:

  1. Postgres should be strictly controlling how much memory each connection is using, so it's worth looking at settings such as work_mem and the like to see that it's set to something sane, and roughly matches the memory you see the connections using.
  2. If Synapse has been configured with cp_min and cp_max being different, then when Synapse makes more DB connections you'll see Postgres memory go up
  3. Postgres has a process per connection, so you should be able to see which connections are using memory (and then use pg_stat_activity or the logs to see what that PID was doing). Usually all PIDs should use the same amount of memory.
olmari commented 3 years ago
  1. work_mem has been tested at least both 4 and 8MB.
  2. cp_min: 5, cp_max: 10, aka textbook setup from Using Postgres, should there be something else or a thing(s) to match in postgres config?
  3. All synapse related 10 postgres PIDs grow generally at same rate. Need to dig logs more then and learn pg_stat_activity better :)

I'd like to reiterate that for any odd reason it is specifically synapse related postgres threads that are growing uncontrollably, other ones behave. And more so only at hacklab.fi.. I'll try to dig around more on point 3 to see if I can muster something meaningful out from anywhere.

intelfx commented 3 years ago

Now I, too, have started experiencing this regularly on my homeserver, intelfx.name.

It needs to be noted that it runs on an absolute monster of a machine, with 128Gi RAM — which does not really help. Over the course of one or two days, the machine eats all available RAM and OOMs.

This is situation after 13h uptime:

$ free -h
              total        used        free      shared  buff/cache   available
Mem:          125Gi        84Gi       1,1Gi        12Mi        40Gi        40Gi
Swap:          15Gi        14Gi       1,8Gi
per-unit memory usage

``` ● postgresql.service - PostgreSQL database server Loaded: loaded (/usr/lib/systemd/system/postgresql.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/postgresql.service.d └─override.conf Active: active (running) since Thu 2021-02-04 03:17:29 MSK; 13h ago Main PID: 183 (postgres) IP: 753.8M in, 753.8M out IO: 74.7G read, 64.9G written Tasks: 71 (limit: 115152) Memory: 27.8G CPU: 8h 10min 5.371s CGroup: /system.slice/postgresql.service ● system-synapse.slice Loaded: loaded (/etc/systemd/system/system-synapse.slice; static) Drop-In: /etc/systemd/system/system-synapse.slice.d └─override.conf Active: active since Thu 2021-02-04 03:17:26 MSK; 13h ago IP: 1.6G in, 8.9G out IO: 61.4M read, 98.3M written Tasks: 123 Memory: 7.5G CPU: 10h 10.188s CGroup: /system.slice/system-synapse.slice ```

systemd does not show per-unit swap usage, so here it is:

per-unit swap usage

``` $ grep . /sys/fs/cgroup/system.slice/**/memory.swap.current | sed -r 's|:([^:]+)$| \1|' | sort -k2 -n | tail -n9 | column -t /sys/fs/cgroup/system.slice/gitea.service/memory.swap.current 68325376 /sys/fs/cgroup/system.slice/grafana.service/memory.swap.current 113516544 /sys/fs/cgroup/system.slice/prometheus.service/memory.swap.current 118857728 /sys/fs/cgroup/system.slice/plexmediaserver.service/memory.swap.current 194744320 /sys/fs/cgroup/system.slice/electrs.service/memory.swap.current 240443392 /sys/fs/cgroup/system.slice/system-synapse.slice/memory.swap.current 502079488 /sys/fs/cgroup/system.slice/bitcoind.service/memory.swap.current 1143627776 /sys/fs/cgroup/system.slice/postgresql.service/memory.swap.current 11913154560 /sys/fs/cgroup/system.slice/memory.swap.current 14536134656 ```

Basically, postgres causes so much memory pressure that almost everything else spills into swap.

PostgreSQL instance itself has been configured using PGTune with 32 GiB of RAM as the usage target because I'm dumb and don't know anything better ¯\_(ツ)_/¯:

postgresql.conf excerpt

``` # DB Version: 12 # OS Type: linux # DB Type: web # Total Memory (RAM): 32 GB # CPUs num: 32 # Connections num: 100 # Data Storage: ssd max_connections = 100 shared_buffers = 8GB effective_cache_size = 24GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 20971kB min_wal_size = 1GB max_wal_size = 4GB max_worker_processes = 32 max_parallel_workers_per_gather = 4 max_parallel_workers = 32 max_parallel_maintenance_workers = 4 ```

Now, you can say "you told it to use 32 GiB so here it is, using 32 GiB". But, looking at memory.stat for postgres cgroup, most of that memory is used for anonymous memory (i. e. in-process allocations) rather than page cache (as it should have been):

memory.stat

``` $ cat /sys/fs/cgroup/system.slice/postgresql.service/memory.stat anon 29313388544 file 1188261888 kernel_stack 1622016 percpu 0 sock 0 shmem 1757184 file_mapped 54743040 file_dirty 540672 file_writeback 1081344 anon_thp 11695816704 inactive_anon 8883453952 active_anon 20450394112 inactive_file 749477888 active_file 437395456 unevictable 0 slab_reclaimable 90423936 slab_unreclaimable 8544984 slab 98968920 workingset_refault_anon 2349138 workingset_refault_file 9206241 workingset_activate_anon 543576 workingset_activate_file 396759 workingset_restore_anon 226215 workingset_restore_file 193017 workingset_nodereclaim 0 pgfault 25197876 pgmajfault 1901955 pgrefill 14387067 pgscan 81787674 pgsteal 18228268 pgactivate 9222312 pgdeactivate 14344133 pglazyfree 0 pglazyfreed 0 thp_fault_alloc 9339 thp_collapse_alloc 3927 ```

I have even less experience debugging PostgreSQL than @olmari, so... Yeah, help wanted.

intelfx commented 3 years ago

https://dba.stackexchange.com/a/222815:

The most likely cause for this is that PostgreSQL keeps a per-connection cache of metadata about all the database objects (tables, indexes, etc.) it has touched during the connection lifetime. There is no upper limit on how big this cache can get, and no mechanism for expiring it. So if you have hundreds of thousands or millions of these things, and a long-lived connection will eventually touch each one of them, then their memory usage will continuously grow. The usual solution to this (assuming you can't reduce the number of objects in your database) is to set up your connection pooler so that each connection has a maximum time to live before being closed rather than recycled.

At least indirectly, this matches what I'm seeing. Synapse uses connection pooling internally, and all connections Synapse makes are long-lived:

postgres=# select (now() - backend_start) as backend_time from pg_stat_activity where datname = 'synapse' order by backend_time asc limit 1;
  backend_time   
-----------------
 13:54:34.814374

@erikjohnston is there any way to make synapse's (twisted's?) connection pooler close database connections unconditionally (but cleanly) after a set amount of time?

intelfx commented 3 years ago

So, as a dirty hack, I've just set up pgbouncer with transaction pooling mode and connection lifetime of 600s and routed synapse through that.

I have no idea if anything will break due to using a connection pooler on top of synapse's built-in pooler (or whether I'll lose performance due to mixing queries from different workers within a single connection), but I'll see how it goes.

richvdh commented 3 years ago

see https://github.com/matrix-org/synapse/issues/4473: ideally you'd find a way of changing the transaction isolation level on connections from the pgbouncer

intelfx commented 3 years ago

ideally you'd find a way of changing the transaction isolation level on connections from the pgbouncer

Oh well. It's a tough choice, then: have your server thrash and OOM every 20 hours, or live with a risk of internal inconsistency.

olmari commented 3 years ago

Major update: Issue seems now to be completely gone after observing ~24h after big cleanup of our HS. and whole computer RAM RES seems to settle just over 4G, which perfect. Now I can't still say what is/was the rootcause for this behaviour, but we can have educated guesses.

Hacklab.fi maintenance tasks done:

I'd suspect the missing media might have been the culprit, but I still honestly can't tell for sure... While at this point I feel indeed issue was implication of something else being the rootcause, it would mean that synapse doesn't know how to handle something in that situation...

So at this time this ticket could also be closed, while I feel here is now important info to know what might have been the trigger for this issue, and in long run maybe synapse would handle things more automatically, say "purging" of nonexisting media perhaps? IDK, ideas though :)

MatMaul commented 3 years ago

@olmari I also spend quite some time investigating that in the last weeks.

The problem is in _get_state_groups_from_groups_txn, so I think the action on your side which helped is your point 4. If you go see this fct, you can see 2 versions of the algo: one using RECURSIVE in SQL, and the other doing that in Python instead for sqlite compatibility. The recursive version is problematic. It's 10+ time slower than the other one, while it should be better. And it triggers a bug in Postgres leading to this huge mem leak.

Adding an index with: create index state_group_idx on state_groups_state using btree(state_group); also completely fix the problem for me. And while it makes sense to me that it is faster, the mem leak clearly seems to be a PG bug. My server RAM is flat since then.

So we should have a proper fix soonish, I am still gathering some metrics and see if we can do better.

intelfx commented 3 years ago

So at this time this ticket could also be closed

Not really. Despite me regularly performing most of those optimizations, my homeserver still suffers from the issue (I'm assuming that there is a better solution to the memory leak than just shooting off old backends).

@MatMaul

The problem is in _get_state_groups_from_groups_txn, so I think the action on your side which helped is your point 4. If you go see this fct, you can see 2 versions of the algo: one using RECURSIVE in SQL, and the other doing that in Python instead for sqlite compatibility. The recursive version is problematic. It's 10+ time slower than the other one, while it should be better. And it triggers a bug in Postgres leading to this huge mem leak.

Adding an index with: create index state_group_idx on state_groups_state using btree(state_group); also completely fix the problem for me. And while it makes sense to me that it is faster, the mem leak clearly seems to be a PG bug. My server RAM is flat since then.

Interesting. So the postgresql query is slower than the sqlite one and also the one that's causes the leak, and both of this is due to a missing index?

intelfx commented 3 years ago

With that index created, relevant queries still take up to 100ms for me:

Feb 26 08:31:45 stratofortress.nexus.i.intelfx.name postgres[4845]: [26-1] 2021-02-26 08:31:45.202 MSK [4845] LOG: duration: 84.957 ms, statement: WITH RECURSIVE state(state_group) AS ( VALUES(5156862::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC

I should assume this is normal?

erikjohnston commented 3 years ago

Interesting that things seem to get better after various clean ups, which somewhat suggests that maybe part of the problem here is large fragmentation of the DB? I don't really understand how any of it is related to memory usage of connections though :/

FTR, the new index really shouldn't be changing much, since we already have an index that covers (state_group, type, state_key), so either a) the query planner wasn't using the old index (which seems unlikely, as that would involve scanning the entire table, and that would take a very long time indeed), or b) by effectively recreating the index the queries are pulling in fewer disk pages and so is much more efficient.

Also, if you're seeing the SQLite style queries going faster than the postgres one, then something is going catastrophically wrong with your DB. The SQLite version is the same as the recursive query, except that the loop has been lifted up into Python.

relevant queries still take up to 100ms for me ... I should assume this is normal?

Yeah, ideally such large queries should be cached, but its easy for the Synapse caches to be too small. Making our caches more intelligent is on our roadmap.

We're also wanting to implement "incremental state res" at some point, which reuses works off deltas of state, which should also hopefully reduce the amount of times we pull large amounts of state from the DB.

erikjohnston commented 3 years ago

(Also FWIW I've been trying to figure out what we can do next to hunt this down, but I'm really drawing a bit of a blank :disappointed: )

olmari commented 3 years ago

I wasn't especially trying to tackle this problem with the cleanup, but indeed it ended to make this issue go away more or less fully in practice, which is ofcourse awesome, while it doesn't tell anything about the underlying root issue(s), be the failure on Synapse or in Postgresql.

Instead of having timer to restart Postgres every 6h because it is eating all ram, now server ram RES grows maybe .3g per day or thereabouts... Way more manageable, even if something small would still be leaking in theory (and that grow caters to every single program on server, so who knows).

olmari commented 3 years ago

It seems out HS instance is starting to exhibit this behaviour again, rate slowly keep climbing... We do have autoretention enabled, if that is something to do with how such unreferenced indexes might get to be created at the first place(?) I have not wanted to do manual PSQL index "fix" for our running production system.

So, @MatMaul Any news on how is things going on with "official" fix for this? :)

MatMaul commented 3 years ago

@olmari Sorry I haven't got the the time to iterate on this. I think the safest is probably to change the code to use the non RECURSIVE version of _get_state_groups_from_groups_txn (the one used for sqlite).

However, my own homeserver is running with the extra index since more than a month now with no drawback, outside of the space used by the index, which seems to be none ? When looking for size indexes I found ~5G for the new index, however the other index on this table (state_groups_state_type_idx) is 0 in size, so I guess they are equivalent. It really seems like we are triggering a stupid bug in psql because apparently indexes on multiples fields like state_groups_state_type_idx should be able to speed up accesses to independent fields, unlike what I thought at first.

MatMaul commented 3 years ago

I just delete and re-create the index to check, the size of the indexes on this table only grew by 10% (from 5GB to 5.5GB).

olmari commented 3 years ago

so, this? https://github.com/matrix-org/synapse/issues/9182#issuecomment-785319373 :) Will do tonight, I think.

MatMaul commented 3 years ago

Yes !

olmari commented 3 years ago

This situation is now more or less came back, be it same or different rootcauses, lastest Synapse and generally everything up to date, ~24h on and matrix is practically unusable and htop shows:

Screenshot from 2021-06-29 09-42-51

ara4n commented 3 years ago

This is probably related to abusive traffic patterns on the network which we're currently mitigating.

intelfx commented 3 years ago

abusive traffic patterns

Are there any details on the problem and the supposed mitigation?

just like administrative problems are rarely solved by technical means, technical problems are also rarely solved by administrative means...

olmari commented 3 years ago

Indeed it seems that this mentioned abusive traffic amplifies this bug very much, so instead of "just" the attack vector itself we have this on top of that, doubling the trouble.

ShadowJonathan commented 3 years ago

Are there any details on the problem and the supposed mitigation?

https://twitter.com/matrixdotorg/status/1409874327135211522

jaywink commented 3 years ago

Semi-related to dealing with some hosts post-spam, we had one host in EMS which ended up going into some very very expensive queries. The query is:

WITH RECURSIVE state(state_group) AS ( VALUES(?::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) AND ((type = ? AND state_key = ?) OR (type = ? AND state_key = ?)) ORDER BY type, state_key, state_group DESC

As soon as a sync request happened after restart, the host went into a pretty much locked state. We added a synchrotron to free the master to do resolve admin API's and such, but this only caused the db load to happen via the synchrotron. All sync attempts were timing out. According to Synapse logs the query was running for up to 20 minutes. Instead of exhausting the PostgreSQL server from ram it was being exchausted from CPU credits. At the time of these queries the postgres server still had plenty of credits left though, so was not constrained on ops yet.

2021-06-30 11:10:10,293 - synapse.storage.txn - 633 - DEBUG - GET-0- [TXN END] {_get_state_groups_from_groups-4d} 1274.932126 sec

Before creating the index we tried upping resources by a bucketful and also increasing the caches by a lot.

Some of the rooms related to the long running queries did not related to high traffic rooms like Matrix HQ. One of the high running queries was linked to a state group for a room with only 500 members.

After being pointed to this issue by @MatMaul I applied the index in https://github.com/matrix-org/synapse/issues/9182#issuecomment-785319373 . Pretty much immediately after that got created syncs started resolving. The index was created approx 3.50pm (EEST, for the times in the below graph and possible logs reference).

Some grafana graphs:

Selection_999(368)

Selection_999(369)

Happy to provide timestamps and server name on request to help debug this.

The issue with sync taking massive amount of database time started during Monday 28th when we were going through hosts that had users in Matrix HQ and using the admin FE api's to try and help hosts with issues. At some point during that the host just stopped responding to all API calls (due to sync hogging the database), so possibility on this being related to FE clearing via the admin API's should not be ruled out (in this particular case).

MacLemon commented 3 years ago

I got hinted to this issue and it reflects very much the issue we're seeing on our server in the recent (few months) past. Our synapse server is ramping up gigabytes and gigabytes of swap from unused postgres connection workers over a few days. Restarting synapse (including workers) immediately purges all of that back to 0.

Of course the OS is correct in swapping out unused processes, which is what happens here. The annoying part is that synapse keeps creating more and more new connections, and certainly more than what we have configured.

    cp_min: 5
    cp_max: 10

We don't see any load or performance issues, other than we have to restart the service every 30GB of new swap to keep it running. Yes, the service disruption is less than a minute, but users don't like it despite it being short.

Just as an additional data point. I cannot observe this on a smaller setup without workers. So maybe this is related to only a specific type of workers?

BBaoVanC commented 3 years ago

I got hinted to this issue and it reflects very much the issue we're seeing on our server in the recent (few months) past. Our synapse server is ramping up gigabytes and gigabytes of swap from unused postgres connection workers over a few days. Restarting synapse (including workers) immediately purges all of that back to 0.

Of course the OS is correct in swapping out unused processes, which is what happens here. The annoying part is that synapse keeps creating more and more new connections, and certainly more than what we have configured.

    cp_min: 5
    cp_max: 10

We don't see any load or performance issues, other than we have to restart the service every 30GB of new swap to keep it running. Yes, the service disruption is less than a minute, but users don't like it despite it being short.

Just as an additional data point. I cannot observe this on a smaller setup without workers. So maybe this is related to only a specific type of workers?

How many and what kind of workers do you have? I have 4 generic and 4 federation senders, and it seems to level off at ~6 GiB of swap after 2 days or so. Total RAM is 8 GiB, and 16 GiB of swap.

MacLemon commented 3 years ago

Our Workers:

We also had

For us it went up until our 32GB of swap were filled and OOMkiller started doing its work. Thats on a 64GB machine with 24GB for postgres and about 8GB for ZFS ARC. So plenty of RAM available for synapse and workers.

richvdh commented 3 years ago

@MacLemon wrote:

The annoying part is that synapse keeps creating more and more new connections, and certainly more than what we have configured.

How are you counting the number of connections?

Synapse's connection pooling is implemented by the Twisted library, but it should only create a new connection when there is more room in the pool, or when an existing connection fails. Do you see any errors in the server log?

hifi commented 3 years ago

Bringing some cents.

I added a really dirty hack to synapse to force reconnecting the connections in the pool. This keeps the memory usage stable and fairly low for me on a small HS. I expect the downside to be more IO but with a sensible timer it would probably be irrelevant.

I think the correct approach to workaround runaway psql memory would be to have a configurable time/txn limit for connections in the pool until they will be reconnected and default it to a high value (~1 hour?).

Let me know if this hack does anything good in a larger setup, please configure the random reconnect time from 30 seconds up if you're running a large HS. I don't know if this works with workers.

--- storage/background_updates.py.orig  2021-07-19 19:57:29.820301464 +0300
+++ storage/background_updates.py       2021-07-20 11:30:07.441263762 +0300
@@ -104,6 +104,25 @@

     def start_doing_background_updates(self) -> None:
         run_as_background_process("background_updates", self.run_background_updates)
+        run_as_background_process("background_updates", self.run_postgres_hack)
+
+    async def run_postgres_hack(self) -> None:
+        logger.info("Starting postgres hack")
+
+        def _kill_me(conn):
+            logger.info("Killing a random database connection for fun and profit")
+            try:
+                conn.reconnect()
+            except Exception:
+                logger.exception("Error reconnecting but it's okay")
+
+        while True:
+            try:
+                await self.db_pool._db_pool.runWithConnection(_kill_me)
+            except Exception:
+                logger.exception("Exception fell through to outer loop but it's okay as well")
+
+            await self._clock.sleep(30)

     async def run_background_updates(self, sleep: bool = True) -> None:
         logger.info("Starting background schema updates")
Scrin commented 3 years ago

I experience this constant memory growth until OOM on two HS instances as well, and after a while of observing stuff and trying out almost every trick in this thread, nothing seems to work, some only lessen the issue (which I think is simply due to reducing the amount of data, reducing the rate of growth.

However based on my observations it seems that the actual root cause is indeed simply having long-lived connections to postgres (which is bad, as mentioned already). Simply by forcing the connections to be closed and reopened the memory usage drops to expected low values no matter how it's done; restarting synapse and workers or killing the postgres backend processes responsible for the specific connections. And apparently the nice "hack" by hifi doing the same works as well, although I haven't tried it yet.

Btw #5728 is loosely related to this, perhaps a fix for both would be a logic that would periodically go through the connections and either close (if more open than the pool minimum) or reconnect them.

olmari commented 3 years ago

I feel that there is something inherently wrong or buggy in Postgresql (or some process leading to it) if it accumulates all the RAM despite any limits set in configuration. Ofcourse, if database-connection recycling cures the symptoms, I'd welcome to be utilized, especially if underlying rootcause can't be fixed.

Scrin commented 3 years ago

Based on my understanding it's a rather.. "unusual" (in world of databases) but intentional design choice to minimize the performance overhead by not having to do garbage collection or other freeing operations in the middle of everything else, and rather just do it all when the connection is no longer in active use to minimize the performance impact on running things. Though this then easily leads into people not realizing this design and using long-lived connections rather than short lived ones.

BBaoVanC commented 3 years ago

Bringing some cents.

I added a really dirty hack to synapse to force reconnecting the connections in the pool. This keeps the memory usage stable and fairly low for me on a small HS. I expect the downside to be more IO but with a sensible timer it would probably be irrelevant.

I think the correct approach to workaround runaway psql memory would be to have a configurable time/txn limit for connections in the pool until they will be reconnected and default it to a high value (~1 hour?).

Let me know if this hack does anything good in a larger setup, please configure the random reconnect time from 30 seconds up if you're running a large HS. I don't know if this works with workers.

--- storage/background_updates.py.orig  2021-07-19 19:57:29.820301464 +0300
+++ storage/background_updates.py       2021-07-20 11:30:07.441263762 +0300
@@ -104,6 +104,25 @@

     def start_doing_background_updates(self) -> None:
         run_as_background_process("background_updates", self.run_background_updates)
+        run_as_background_process("background_updates", self.run_postgres_hack)
+
+    async def run_postgres_hack(self) -> None:
+        logger.info("Starting postgres hack")
+
+        def _kill_me(conn):
+            logger.info("Killing a random database connection for fun and profit")
+            try:
+                conn.reconnect()
+            except Exception:
+                logger.exception("Error reconnecting but it's okay")
+
+        while True:
+            try:
+                await self.db_pool._db_pool.runWithConnection(_kill_me)
+            except Exception:
+                logger.exception("Exception fell through to outer loop but it's okay as well")
+
+            await self._clock.sleep(30)

     async def run_background_updates(self, sleep: bool = True) -> None:
         logger.info("Starting background schema updates")

I'll have to try this (it'll take a bit of extra work because I'll have to build my own docker image). I wonder if maybe this could be a config option or something, since there's probably a lot of people (myself included) who would gladly take a bit higher IO over high memory usage.

hifi commented 3 years ago

Opened https://github.com/matrix-org/synapse/pull/10440 with more mergeable deterministic approach and less hacky. I'm running my small HS with a 1000 transaction limit but a sensible default is likely in the ballpark of 10 000 and 100 000 as it's not really required to keep reconnecting them all the time.

richvdh commented 3 years ago

We discussed this at some length in #synapse-dev yesterday . Relevant points were:

So, for whatever reason, matrix.org isn't suffering this memory bloat in quite the same way as is reported here.

My main concern with the approach of just closing Postgres connections is that we are fixing the symptoms rather than understanding the problem. Using a connection pool is an extremely common technique in sql client apps, and if Postgres expects you to close the connection every so often, then I'd expect that to be well-documented.

I'd be much happier with this approach if:

intelfx commented 3 years ago

I might be talking well out of my ass here (my HS stopped experiencing this problem a while ago, of course it isn't a solution, but I can't debug a problem that doesn't happen...), but could it be heap fragmentation due to use of sbrk by the underlying allocator?

Try to strace one of misbehaving postgres workers and see if it calls sbrk(). (strace -p PID -f -e trace=sbrk)

hifi commented 3 years ago

I've been running #10440 for two days now and the only memory leak I've had is with synapse itself which is slowly accumulating but it's not as bad as the postgres workers were. PostgreSQL itself has been freeing memory as configured and I haven't seen any issues so far. The total increase of memory usage is somewhere around 200MB on the whole system in two days which is pretty much all synapse.

olmari commented 3 years ago

Thank you all for the fix! especially @hifi

Adding an index with: create index state_group_idx on state_groups_state using btree(state_group); also completely fix the problem for me. And while it makes sense to me that it is faster, the mem leak clearly seems to be a PG bug. So we should have a proper fix soonish, I am still gathering some metrics and see if we can do better.

@MatMaul Should this index stuff warrant its own issue? I mean "obviously" it does do something that without it is worse, so I feel this now more or less "side-track" shouldn't be totally forgotten either :)

heftig commented 3 years ago

FTR, we've also been suffering from a PostgreSQL leak with synapse that was solved by turning off LLVM JIT compilation (jit = 0), see PostgreSQL BUG #16707.

I'm looking forward to trying the transaction limit as an alternative. Still, it would be nice to know if the people suffering from memory leaks here have similar success turning off the JIT.

heftig commented 9 months ago

Looks like a proper fix was committed to PostgreSQL recently and should be part of the next set of minor releases (16.2 etc).