Closed olmari closed 3 years ago
I've seen similar things on my homeserver before switching my db over to much faster storage and I believe @jcgruenhage is also currently encountering this.
Storage is Upcloud MaxIOPS, so it shouldn't be exactly sluggish, especially at this lever of synapse, still relateively small synapse instance (few 10's of users and bunch of bridges).
Yeah, no, it's definitely not storage related in my case either. My postgres DB is on an NVMe SSD, so IOPS shouldn't be a problem.
I've first observed this issue when migrating from Fedora (pg in docker) to Void (pg still in docker) on the host machine. Hardware didn't change, but the SSD switched from being mounted directly to having a LUKS encryption layer in between.
Worth noting: all RAM means 32GB of RAM and 16GB of SWAP in my case. All other services stay under 8GB (total, not each)
Throwing my voice into the ring, since upgrading to 1.25.0 I've had 5 instances of out-of-memory killer going after Synapse. Not exactly sure how to go about debugging it on my end. Its a low spec VPS, 2 vCPU and 4GB vRAM but I didn't have a high frequency of oom kills prior to 1.25.0, at least not enough to require quickly writing up a monitoring script to detect synapse being down and restarting it if it is.
I'm not even sure when this exactly begun, it has been quite long time now and it is not getting better on its own :7
Added all used bridges to issue post, just to list everything possible, while rootisue still is synapses postgre threads growing. :)
I've recently seen this on my homeserver (intelfx.name), yet to reproduce.
Storage is Upcloud MaxIOPS, so it shouldn't be exactly sluggish, especially at this lever of synapse, still relateively small synapse instance (few 10's of users and bunch of bridges).
Upcloud has some of the best storage performance in the business (even better than AWS EC2's solution), it really shouldn't be sluggish.
And yeah, synapse 1.25 (and even prior) absolutely guzzles down working memory. with the more active rooms (both local and remotely accessed) and users on the server.
@kaijuagenda This problem is not synapse eating ram itself, but synapse making postgre to eat all ram.
A couple of things here:
work_mem
and the like to see that it's set to something sane, and roughly matches the memory you see the connections using.cp_min
and cp_max
being different, then when Synapse makes more DB connections you'll see Postgres memory go uppg_stat_activity
or the logs to see what that PID was doing). Usually all PIDs should use the same amount of memory.work_mem
has been tested at least both 4 and 8MB.cp_min: 5
, cp_max: 10
, aka textbook setup from Using Postgres, should there be something else or a thing(s) to match in postgres config?pg_stat_activity
better :)I'd like to reiterate that for any odd reason it is specifically synapse related postgres threads that are growing uncontrollably, other ones behave. And more so only at hacklab.fi.. I'll try to dig around more on point 3 to see if I can muster something meaningful out from anywhere.
Now I, too, have started experiencing this regularly on my homeserver, intelfx.name.
It needs to be noted that it runs on an absolute monster of a machine, with 128Gi RAM — which does not really help. Over the course of one or two days, the machine eats all available RAM and OOMs.
This is situation after 13h uptime:
$ free -h
total used free shared buff/cache available
Mem: 125Gi 84Gi 1,1Gi 12Mi 40Gi 40Gi
Swap: 15Gi 14Gi 1,8Gi
``` ● postgresql.service - PostgreSQL database server Loaded: loaded (/usr/lib/systemd/system/postgresql.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/postgresql.service.d └─override.conf Active: active (running) since Thu 2021-02-04 03:17:29 MSK; 13h ago Main PID: 183 (postgres) IP: 753.8M in, 753.8M out IO: 74.7G read, 64.9G written Tasks: 71 (limit: 115152) Memory: 27.8G CPU: 8h 10min 5.371s CGroup: /system.slice/postgresql.service ● system-synapse.slice Loaded: loaded (/etc/systemd/system/system-synapse.slice; static) Drop-In: /etc/systemd/system/system-synapse.slice.d └─override.conf Active: active since Thu 2021-02-04 03:17:26 MSK; 13h ago IP: 1.6G in, 8.9G out IO: 61.4M read, 98.3M written Tasks: 123 Memory: 7.5G CPU: 10h 10.188s CGroup: /system.slice/system-synapse.slice ```
systemd does not show per-unit swap usage, so here it is:
``` $ grep . /sys/fs/cgroup/system.slice/**/memory.swap.current | sed -r 's|:([^:]+)$| \1|' | sort -k2 -n | tail -n9 | column -t /sys/fs/cgroup/system.slice/gitea.service/memory.swap.current 68325376 /sys/fs/cgroup/system.slice/grafana.service/memory.swap.current 113516544 /sys/fs/cgroup/system.slice/prometheus.service/memory.swap.current 118857728 /sys/fs/cgroup/system.slice/plexmediaserver.service/memory.swap.current 194744320 /sys/fs/cgroup/system.slice/electrs.service/memory.swap.current 240443392 /sys/fs/cgroup/system.slice/system-synapse.slice/memory.swap.current 502079488 /sys/fs/cgroup/system.slice/bitcoind.service/memory.swap.current 1143627776 /sys/fs/cgroup/system.slice/postgresql.service/memory.swap.current 11913154560 /sys/fs/cgroup/system.slice/memory.swap.current 14536134656 ```
Basically, postgres causes so much memory pressure that almost everything else spills into swap.
PostgreSQL instance itself has been configured using PGTune with 32 GiB of RAM as the usage target because I'm dumb and don't know anything better ¯\_(ツ)_/¯:
``` # DB Version: 12 # OS Type: linux # DB Type: web # Total Memory (RAM): 32 GB # CPUs num: 32 # Connections num: 100 # Data Storage: ssd max_connections = 100 shared_buffers = 8GB effective_cache_size = 24GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.7 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 20971kB min_wal_size = 1GB max_wal_size = 4GB max_worker_processes = 32 max_parallel_workers_per_gather = 4 max_parallel_workers = 32 max_parallel_maintenance_workers = 4 ```
Now, you can say "you told it to use 32 GiB so here it is, using 32 GiB". But, looking at memory.stat for postgres cgroup, most of that memory is used for anonymous memory (i. e. in-process allocations) rather than page cache (as it should have been):
``` $ cat /sys/fs/cgroup/system.slice/postgresql.service/memory.stat anon 29313388544 file 1188261888 kernel_stack 1622016 percpu 0 sock 0 shmem 1757184 file_mapped 54743040 file_dirty 540672 file_writeback 1081344 anon_thp 11695816704 inactive_anon 8883453952 active_anon 20450394112 inactive_file 749477888 active_file 437395456 unevictable 0 slab_reclaimable 90423936 slab_unreclaimable 8544984 slab 98968920 workingset_refault_anon 2349138 workingset_refault_file 9206241 workingset_activate_anon 543576 workingset_activate_file 396759 workingset_restore_anon 226215 workingset_restore_file 193017 workingset_nodereclaim 0 pgfault 25197876 pgmajfault 1901955 pgrefill 14387067 pgscan 81787674 pgsteal 18228268 pgactivate 9222312 pgdeactivate 14344133 pglazyfree 0 pglazyfreed 0 thp_fault_alloc 9339 thp_collapse_alloc 3927 ```
I have even less experience debugging PostgreSQL than @olmari, so... Yeah, help wanted.
https://dba.stackexchange.com/a/222815:
The most likely cause for this is that PostgreSQL keeps a per-connection cache of metadata about all the database objects (tables, indexes, etc.) it has touched during the connection lifetime. There is no upper limit on how big this cache can get, and no mechanism for expiring it. So if you have hundreds of thousands or millions of these things, and a long-lived connection will eventually touch each one of them, then their memory usage will continuously grow. The usual solution to this (assuming you can't reduce the number of objects in your database) is to set up your connection pooler so that each connection has a maximum time to live before being closed rather than recycled.
At least indirectly, this matches what I'm seeing. Synapse uses connection pooling internally, and all connections Synapse makes are long-lived:
postgres=# select (now() - backend_start) as backend_time from pg_stat_activity where datname = 'synapse' order by backend_time asc limit 1;
backend_time
-----------------
13:54:34.814374
@erikjohnston is there any way to make synapse's (twisted's?) connection pooler close database connections unconditionally (but cleanly) after a set amount of time?
So, as a dirty hack, I've just set up pgbouncer with transaction pooling mode and connection lifetime of 600s and routed synapse through that.
I have no idea if anything will break due to using a connection pooler on top of synapse's built-in pooler (or whether I'll lose performance due to mixing queries from different workers within a single connection), but I'll see how it goes.
see https://github.com/matrix-org/synapse/issues/4473: ideally you'd find a way of changing the transaction isolation level on connections from the pgbouncer
ideally you'd find a way of changing the transaction isolation level on connections from the pgbouncer
Oh well. It's a tough choice, then: have your server thrash and OOM every 20 hours, or live with a risk of internal inconsistency.
Major update: Issue seems now to be completely gone after observing ~24h after big cleanup of our HS. and whole computer RAM RES seems to settle just over 4G, which perfect. Now I can't still say what is/was the rootcause for this behaviour, but we can have educated guesses.
Hacklab.fi maintenance tasks done:
I'd suspect the missing media might have been the culprit, but I still honestly can't tell for sure... While at this point I feel indeed issue was implication of something else being the rootcause, it would mean that synapse doesn't know how to handle something in that situation...
So at this time this ticket could also be closed, while I feel here is now important info to know what might have been the trigger for this issue, and in long run maybe synapse would handle things more automatically, say "purging" of nonexisting media perhaps? IDK, ideas though :)
@olmari I also spend quite some time investigating that in the last weeks.
The problem is in _get_state_groups_from_groups_txn
, so I think the action on your side which helped is your point 4.
If you go see this fct, you can see 2 versions of the algo: one using RECURSIVE
in SQL, and the other doing that in Python instead for sqlite compatibility.
The recursive version is problematic. It's 10+ time slower than the other one, while it should be better. And it triggers a bug in Postgres leading to this huge mem leak.
Adding an index with:
create index state_group_idx on state_groups_state using btree(state_group);
also completely fix the problem for me. And while it makes sense to me that it is faster, the mem leak clearly seems to be a PG bug.
My server RAM is flat since then.
So we should have a proper fix soonish, I am still gathering some metrics and see if we can do better.
So at this time this ticket could also be closed
Not really. Despite me regularly performing most of those optimizations, my homeserver still suffers from the issue (I'm assuming that there is a better solution to the memory leak than just shooting off old backends).
@MatMaul
The problem is in _get_state_groups_from_groups_txn, so I think the action on your side which helped is your point 4. If you go see this fct, you can see 2 versions of the algo: one using RECURSIVE in SQL, and the other doing that in Python instead for sqlite compatibility. The recursive version is problematic. It's 10+ time slower than the other one, while it should be better. And it triggers a bug in Postgres leading to this huge mem leak.
Adding an index with: create index state_group_idx on state_groups_state using btree(state_group); also completely fix the problem for me. And while it makes sense to me that it is faster, the mem leak clearly seems to be a PG bug. My server RAM is flat since then.
Interesting. So the postgresql query is slower than the sqlite one and also the one that's causes the leak, and both of this is due to a missing index?
With that index created, relevant queries still take up to 100ms for me:
Feb 26 08:31:45 stratofortress.nexus.i.intelfx.name postgres[4845]: [26-1] 2021-02-26 08:31:45.202 MSK [4845] LOG: duration: 84.957 ms, statement: WITH RECURSIVE state(state_group) AS ( VALUES(5156862::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC
I should assume this is normal?
Interesting that things seem to get better after various clean ups, which somewhat suggests that maybe part of the problem here is large fragmentation of the DB? I don't really understand how any of it is related to memory usage of connections though :/
FTR, the new index really shouldn't be changing much, since we already have an index that covers (state_group, type, state_key)
, so either a) the query planner wasn't using the old index (which seems unlikely, as that would involve scanning the entire table, and that would take a very long time indeed), or b) by effectively recreating the index the queries are pulling in fewer disk pages and so is much more efficient.
Also, if you're seeing the SQLite style queries going faster than the postgres one, then something is going catastrophically wrong with your DB. The SQLite version is the same as the recursive query, except that the loop has been lifted up into Python.
relevant queries still take up to 100ms for me ... I should assume this is normal?
Yeah, ideally such large queries should be cached, but its easy for the Synapse caches to be too small. Making our caches more intelligent is on our roadmap.
We're also wanting to implement "incremental state res" at some point, which reuses works off deltas of state, which should also hopefully reduce the amount of times we pull large amounts of state from the DB.
(Also FWIW I've been trying to figure out what we can do next to hunt this down, but I'm really drawing a bit of a blank :disappointed: )
I wasn't especially trying to tackle this problem with the cleanup, but indeed it ended to make this issue go away more or less fully in practice, which is ofcourse awesome, while it doesn't tell anything about the underlying root issue(s), be the failure on Synapse or in Postgresql.
Instead of having timer to restart Postgres every 6h because it is eating all ram, now server ram RES grows maybe .3g per day or thereabouts... Way more manageable, even if something small would still be leaking in theory (and that grow caters to every single program on server, so who knows).
It seems out HS instance is starting to exhibit this behaviour again, rate slowly keep climbing... We do have autoretention enabled, if that is something to do with how such unreferenced indexes might get to be created at the first place(?) I have not wanted to do manual PSQL index "fix" for our running production system.
So, @MatMaul Any news on how is things going on with "official" fix for this? :)
@olmari Sorry I haven't got the the time to iterate on this. I think the safest is probably to change the code to use the non RECURSIVE version of _get_state_groups_from_groups_txn
(the one used for sqlite).
However, my own homeserver is running with the extra index since more than a month now with no drawback, outside of the space used by the index, which seems to be none ? When looking for size indexes I found ~5G for the new index, however the other index on this table (state_groups_state_type_idx
) is 0 in size, so I guess they are equivalent. It really seems like we are triggering a stupid bug in psql because apparently indexes on multiples fields like state_groups_state_type_idx
should be able to speed up accesses to independent fields, unlike what I thought at first.
I just delete and re-create the index to check, the size of the indexes on this table only grew by 10% (from 5GB to 5.5GB).
so, this? https://github.com/matrix-org/synapse/issues/9182#issuecomment-785319373 :) Will do tonight, I think.
Yes !
This situation is now more or less came back, be it same or different rootcauses, lastest Synapse and generally everything up to date, ~24h on and matrix is practically unusable and htop shows:
This is probably related to abusive traffic patterns on the network which we're currently mitigating.
abusive traffic patterns
Are there any details on the problem and the supposed mitigation?
just like administrative problems are rarely solved by technical means, technical problems are also rarely solved by administrative means...
Indeed it seems that this mentioned abusive traffic amplifies this bug very much, so instead of "just" the attack vector itself we have this on top of that, doubling the trouble.
Are there any details on the problem and the supposed mitigation?
Semi-related to dealing with some hosts post-spam, we had one host in EMS which ended up going into some very very expensive queries. The query is:
WITH RECURSIVE state(state_group) AS ( VALUES(?::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) AND ((type = ? AND state_key = ?) OR (type = ? AND state_key = ?)) ORDER BY type, state_key, state_group DESC
As soon as a sync request happened after restart, the host went into a pretty much locked state. We added a synchrotron to free the master to do resolve admin API's and such, but this only caused the db load to happen via the synchrotron. All sync attempts were timing out. According to Synapse logs the query was running for up to 20 minutes. Instead of exhausting the PostgreSQL server from ram it was being exchausted from CPU credits. At the time of these queries the postgres server still had plenty of credits left though, so was not constrained on ops yet.
2021-06-30 11:10:10,293 - synapse.storage.txn - 633 - DEBUG - GET-0- [TXN END] {_get_state_groups_from_groups-4d} 1274.932126 sec
Before creating the index we tried upping resources by a bucketful and also increasing the caches by a lot.
Some of the rooms related to the long running queries did not related to high traffic rooms like Matrix HQ. One of the high running queries was linked to a state group for a room with only 500 members.
After being pointed to this issue by @MatMaul I applied the index in https://github.com/matrix-org/synapse/issues/9182#issuecomment-785319373 . Pretty much immediately after that got created syncs started resolving. The index was created approx 3.50pm (EEST, for the times in the below graph and possible logs reference).
Some grafana graphs:
Happy to provide timestamps and server name on request to help debug this.
The issue with sync taking massive amount of database time started during Monday 28th when we were going through hosts that had users in Matrix HQ and using the admin FE api's to try and help hosts with issues. At some point during that the host just stopped responding to all API calls (due to sync hogging the database), so possibility on this being related to FE clearing via the admin API's should not be ruled out (in this particular case).
I got hinted to this issue and it reflects very much the issue we're seeing on our server in the recent (few months) past.
Our synapse
server is ramping up gigabytes and gigabytes of swap from unused postgres
connection workers over a few days. Restarting synapse
(including workers) immediately purges all of that back to 0.
Of course the OS is correct in swapping out unused processes, which is what happens here. The annoying part is that synapse
keeps creating more and more new connections, and certainly more than what we have configured.
cp_min: 5
cp_max: 10
We don't see any load or performance issues, other than we have to restart the service every 30GB of new swap to keep it running. Yes, the service disruption is less than a minute, but users don't like it despite it being short.
Just as an additional data point. I cannot observe this on a smaller setup without workers. So maybe this is related to only a specific type of workers?
I got hinted to this issue and it reflects very much the issue we're seeing on our server in the recent (few months) past. Our
synapse
server is ramping up gigabytes and gigabytes of swap from unusedpostgres
connection workers over a few days. Restartingsynapse
(including workers) immediately purges all of that back to 0.Of course the OS is correct in swapping out unused processes, which is what happens here. The annoying part is that
synapse
keeps creating more and more new connections, and certainly more than what we have configured.cp_min: 5 cp_max: 10
We don't see any load or performance issues, other than we have to restart the service every 30GB of new swap to keep it running. Yes, the service disruption is less than a minute, but users don't like it despite it being short.
Just as an additional data point. I cannot observe this on a smaller setup without workers. So maybe this is related to only a specific type of workers?
How many and what kind of workers do you have? I have 4 generic and 4 federation senders, and it seems to level off at ~6 GiB of swap after 2 days or so. Total RAM is 8 GiB, and 16 GiB of swap.
Our Workers:
We also had
For us it went up until our 32GB of swap were filled and OOMkiller started doing its work.
Thats on a 64GB machine with 24GB for postgres
and about 8GB for ZFS ARC. So plenty of RAM available for synapse
and workers.
@MacLemon wrote:
The annoying part is that synapse keeps creating more and more new connections, and certainly more than what we have configured.
How are you counting the number of connections?
Synapse's connection pooling is implemented by the Twisted library, but it should only create a new connection when there is more room in the pool, or when an existing connection fails. Do you see any errors in the server log?
Bringing some cents.
I added a really dirty hack to synapse to force reconnecting the connections in the pool. This keeps the memory usage stable and fairly low for me on a small HS. I expect the downside to be more IO but with a sensible timer it would probably be irrelevant.
I think the correct approach to workaround runaway psql memory would be to have a configurable time/txn limit for connections in the pool until they will be reconnected and default it to a high value (~1 hour?).
Let me know if this hack does anything good in a larger setup, please configure the random reconnect time from 30 seconds up if you're running a large HS. I don't know if this works with workers.
--- storage/background_updates.py.orig 2021-07-19 19:57:29.820301464 +0300
+++ storage/background_updates.py 2021-07-20 11:30:07.441263762 +0300
@@ -104,6 +104,25 @@
def start_doing_background_updates(self) -> None:
run_as_background_process("background_updates", self.run_background_updates)
+ run_as_background_process("background_updates", self.run_postgres_hack)
+
+ async def run_postgres_hack(self) -> None:
+ logger.info("Starting postgres hack")
+
+ def _kill_me(conn):
+ logger.info("Killing a random database connection for fun and profit")
+ try:
+ conn.reconnect()
+ except Exception:
+ logger.exception("Error reconnecting but it's okay")
+
+ while True:
+ try:
+ await self.db_pool._db_pool.runWithConnection(_kill_me)
+ except Exception:
+ logger.exception("Exception fell through to outer loop but it's okay as well")
+
+ await self._clock.sleep(30)
async def run_background_updates(self, sleep: bool = True) -> None:
logger.info("Starting background schema updates")
I experience this constant memory growth until OOM on two HS instances as well, and after a while of observing stuff and trying out almost every trick in this thread, nothing seems to work, some only lessen the issue (which I think is simply due to reducing the amount of data, reducing the rate of growth.
However based on my observations it seems that the actual root cause is indeed simply having long-lived connections to postgres (which is bad, as mentioned already). Simply by forcing the connections to be closed and reopened the memory usage drops to expected low values no matter how it's done; restarting synapse and workers or killing the postgres backend processes responsible for the specific connections. And apparently the nice "hack" by hifi doing the same works as well, although I haven't tried it yet.
Btw #5728 is loosely related to this, perhaps a fix for both would be a logic that would periodically go through the connections and either close (if more open than the pool minimum) or reconnect them.
I feel that there is something inherently wrong or buggy in Postgresql (or some process leading to it) if it accumulates all the RAM despite any limits set in configuration. Ofcourse, if database-connection recycling cures the symptoms, I'd welcome to be utilized, especially if underlying rootcause can't be fixed.
Based on my understanding it's a rather.. "unusual" (in world of databases) but intentional design choice to minimize the performance overhead by not having to do garbage collection or other freeing operations in the middle of everything else, and rather just do it all when the connection is no longer in active use to minimize the performance impact on running things. Though this then easily leads into people not realizing this design and using long-lived connections rather than short lived ones.
Bringing some cents.
I added a really dirty hack to synapse to force reconnecting the connections in the pool. This keeps the memory usage stable and fairly low for me on a small HS. I expect the downside to be more IO but with a sensible timer it would probably be irrelevant.
I think the correct approach to workaround runaway psql memory would be to have a configurable time/txn limit for connections in the pool until they will be reconnected and default it to a high value (~1 hour?).
Let me know if this hack does anything good in a larger setup, please configure the random reconnect time from 30 seconds up if you're running a large HS. I don't know if this works with workers.
--- storage/background_updates.py.orig 2021-07-19 19:57:29.820301464 +0300 +++ storage/background_updates.py 2021-07-20 11:30:07.441263762 +0300 @@ -104,6 +104,25 @@ def start_doing_background_updates(self) -> None: run_as_background_process("background_updates", self.run_background_updates) + run_as_background_process("background_updates", self.run_postgres_hack) + + async def run_postgres_hack(self) -> None: + logger.info("Starting postgres hack") + + def _kill_me(conn): + logger.info("Killing a random database connection for fun and profit") + try: + conn.reconnect() + except Exception: + logger.exception("Error reconnecting but it's okay") + + while True: + try: + await self.db_pool._db_pool.runWithConnection(_kill_me) + except Exception: + logger.exception("Exception fell through to outer loop but it's okay as well") + + await self._clock.sleep(30) async def run_background_updates(self, sleep: bool = True) -> None: logger.info("Starting background schema updates")
I'll have to try this (it'll take a bit of extra work because I'll have to build my own docker image). I wonder if maybe this could be a config option or something, since there's probably a lot of people (myself included) who would gladly take a bit higher IO over high memory usage.
Opened https://github.com/matrix-org/synapse/pull/10440 with more mergeable deterministic approach and less hacky. I'm running my small HS with a 1000 transaction limit but a sensible default is likely in the ballpark of 10 000 and 100 000 as it's not really required to keep reconnecting them all the time.
We discussed this at some length in #synapse-dev yesterday . Relevant points were:
shared_buffers
of 128GBActive(anon)
and Inactive(anon)
in /proc/meminfo
) is approximately 150GB - so about 22GB on top of shared_buffers
So, for whatever reason, matrix.org isn't suffering this memory bloat in quite the same way as is reported here.
My main concern with the approach of just closing Postgres connections is that we are fixing the symptoms rather than understanding the problem. Using a connection pool is an extremely common technique in sql client apps, and if Postgres expects you to close the connection every so often, then I'd expect that to be well-documented.
I'd be much happier with this approach if:
I might be talking well out of my ass here (my HS stopped experiencing this problem a while ago, of course it isn't a solution, but I can't debug a problem that doesn't happen...), but could it be heap fragmentation due to use of sbrk by the underlying allocator?
Try to strace one of misbehaving postgres workers and see if it calls sbrk()
. (strace -p PID -f -e trace=sbrk
)
I've been running #10440 for two days now and the only memory leak I've had is with synapse itself which is slowly accumulating but it's not as bad as the postgres workers were. PostgreSQL itself has been freeing memory as configured and I haven't seen any issues so far. The total increase of memory usage is somewhere around 200MB on the whole system in two days which is pretty much all synapse.
Thank you all for the fix! especially @hifi
Adding an index with:
create index state_group_idx on state_groups_state using btree(state_group);
also completely fix the problem for me. And while it makes sense to me that it is faster, the mem leak clearly seems to be a PG bug. So we should have a proper fix soonish, I am still gathering some metrics and see if we can do better.
@MatMaul Should this index stuff warrant its own issue? I mean "obviously" it does do something that without it is worse, so I feel this now more or less "side-track" shouldn't be totally forgotten either :)
FTR, we've also been suffering from a PostgreSQL leak with synapse that was solved by turning off LLVM JIT compilation (jit = 0
), see PostgreSQL BUG #16707.
I'm looking forward to trying the transaction limit as an alternative. Still, it would be nice to know if the people suffering from memory leaks here have similar success turning off the JIT.
Looks like a proper fix was committed to PostgreSQL recently and should be part of the next set of minor releases (16.2
etc).
Description
Synapse makes related postgres processes to slowly eat all ram (16g) and swap until OOM hits.
Steps to reproduce
For this I do'nt have anything to really tell, will provide tried things later on.
Version information
When postgres and synapse is started, all synapse related postgres threads has RES quite exactly 250mb, after 12h or so their RES is approaching 1200mb each, median 1100. Obviously I have tried all the normal postgres tunables but nothing really makes perceivable difference on this behaviour. Restarting either synapse or postgres process will make memory go back to normal RES, just to climb away to OOM again
I have tried to dig what could possibly cause this behaviour, but at this point I an out of ideas. On comparable but maybe doubly bigger HS kapsi.fi that does have redis workers and whatnot enabled, does not experience this behaviour, postgres process is well happy with memory pretty much it has at startup.
Here is
postgres=# select datname, usename, pid, backend_start, xact_start, query_start, state_change, wait_event_type, wait_event, state, query from pg_stat_activity;
I've also tried to run
(gdb) p MemoryContextStats(TopMemoryContext)
onto one such process, here are results from startup and then when ram almost eaten:So, something in our instance of synapse makes it query sometihng from postgres that will never get released or accumulates over time, but I can't figure out what. Debugging help wanted, if there is no known things to do.
ADD: While rootissue is exactly synapse postgres growing, we also deploy following bridges on same server, should they bear anything in this equation:
mautrix-telegram mautrix-whatsapp mautrix-facebook mautrix-signal SmsMatrix bot appservice-slack appservice-discord