Open bhalonen opened 2 years ago
@toluaina thanks for the response!
we are definitely out of the initial sync phase. The resource usage creeps up over hours.
Restarting the daemon definitely drops memory, our hold over solution is to run pgsync in run once mode in a loop from bash.
Thanks for the SQLAlchemy tip, I was suspecting it was something with the queries.
Interesting, for some context on timing (Brent and I working on the same issue). After a restart we are up from negligible memory usage to 44% of our machine in under 2 hours (approx 28gb memory used)
My guess is that the initial sync is still not complete even though it may appear so .
@toluaina what would you look for to indicate that initial sync is complete? We have one very simple plugin:
from pgsync import plugin
class IdInDomainPlugin(plugin.Plugin):
name = 'IdInDomain'
def transform(self, doc, **kwargs):
if 'id_in_domain' in doc:
id_in_domain = doc['id_in_domain']
doc['id_in_domain'] = str(id_in_domain)
return doc
right now we have a 10-30s sync time with a ~400x decrease in memory usage by looping with run once.
Sync starcraft Xlog: [0] => Db: [0] => Redis: [total = 0 pending = 0] => Elastic: [0] ...
Sync starcraft Xlog: [0] => Db: [0] => Redis: [total = 0 pending = 0] => Elastic: [0] ...
Sync starcraft Xlog: [0] => Db: [0] => Redis: [total = 0 pending = 0] => Elastic: [0] ...
Sync starcraft Xlog: [0] => Db: [0] => Redis: [total = 0 pending = 0] => Elastic: [0] ...
From what I can see the latest logs look pretty healthy and everything seems to work well besides the memory increase
Here is a frame where data was being synced
and then back to 0 in the next set of logs
I have a similar issue and may have some more data to share to help shine a light on this. I run top
on the pgsync container, which is running on a GCP VM instance (Google cloud OS) with 4GB RAM. After the initial sync, I start the -d
process and it logs like this:
Sync foo Xlog: [0] => Db: [0] => Redis: [total = 0 pending = 0] => Elastic: [0] ...
Looking at the RSS in top
, pgsync is using ~95MB RAM.
I have a Postgres table with 600,000 rows that I sync to ES with a simple mapping, no children nodes. If I update the table all at once (e.g., update mytable set x=x+1
), the pgsync RSS spikes from 95MB to 3.0GB within seconds, and within a minute the OOM Killer shuts down pgsync.
[84675.994166] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=docker.service,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-8ed4d7f0dbab30e509c77e6a1ceaa1ea1c6e8262ad691c10a72ac0218e51d876.scope,task=pgsync,pid=2339,uid=1000
[84676.016132] Out of memory: Killed process 2339 (pgsync) total-vm:4378844kB, anon-rss:3435444kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:7036kB oom_score_adj:0
If I repeat that whole process but only update 300K of the 600K, RSS shoots to 1.6GB but everything runs. The redis queue gets worked through in a few minutes and the CPU stabilizes, but the RSS stays at 1.6GB indefinitely. I understand there are several non-psgync reasons why RSS may not decrease. What's causing problems is the almost immediate ballooning of RAM when the daemon receives a lot of updates.
In the comment above, I see "Once the initial sync is complete, memory usage should not continue to grow" but this is not at all the case for me, and apparently not for the OP either.
PGSync version: 2.3.1 Postgres version: 13.7 Elasticsearch version: 8.3.2 Redis version: redis:7-alpine Python version: 3.10.5 Problem Description: We are experiencing growing memory usage during production. How can we limit pgsync's memory usage? Does limiting query complexity help with that? ngram count (would expect that to be an elastic thing)?
Our db size is 750 MB and pgsync is using 40GB of memory.