toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.19k stars 182 forks source link

Resource usage climbing #315

Open bhalonen opened 2 years ago

bhalonen commented 2 years ago

PGSync version: 2.3.1 Postgres version: 13.7 Elasticsearch version: 8.3.2 Redis version: redis:7-alpine Python version: 3.10.5 Problem Description: We are experiencing growing memory usage during production. How can we limit pgsync's memory usage? Does limiting query complexity help with that? ngram count (would expect that to be an elastic thing)?

Our db size is 750 MB and pgsync is using 40GB of memory.

toluaina commented 2 years ago
bhalonen commented 2 years ago

@toluaina thanks for the response!

we are definitely out of the initial sync phase. The resource usage creeps up over hours.

Restarting the daemon definitely drops memory, our hold over solution is to run pgsync in run once mode in a loop from bash.

Thanks for the SQLAlchemy tip, I was suspecting it was something with the queries.

zacherkkila commented 2 years ago

Interesting, for some context on timing (Brent and I working on the same issue). After a restart we are up from negligible memory usage to 44% of our machine in under 2 hours (approx 28gb memory used)

image
toluaina commented 2 years ago

My guess is that the initial sync is still not complete even though it may appear so .

bhalonen commented 2 years ago

@toluaina what would you look for to indicate that initial sync is complete? We have one very simple plugin:

from pgsync import plugin

class IdInDomainPlugin(plugin.Plugin):
    name = 'IdInDomain'

    def transform(self, doc, **kwargs):
        if 'id_in_domain' in doc:
            id_in_domain = doc['id_in_domain'] 
            doc['id_in_domain'] = str(id_in_domain)

        return doc

right now we have a 10-30s sync time with a ~400x decrease in memory usage by looping with run once.

toluaina commented 2 years ago
zacherkkila commented 2 years ago

From what I can see the latest logs look pretty healthy and everything seems to work well besides the memory increase

image
zacherkkila commented 2 years ago

Here is a frame where data was being synced

image

and then back to 0 in the next set of logs

image
loren commented 2 years ago

I have a similar issue and may have some more data to share to help shine a light on this. I run top on the pgsync container, which is running on a GCP VM instance (Google cloud OS) with 4GB RAM. After the initial sync, I start the -d process and it logs like this:

Sync foo Xlog: [0] => Db: [0] => Redis: [total = 0 pending = 0] => Elastic: [0] ...

Looking at the RSS in top, pgsync is using ~95MB RAM.

I have a Postgres table with 600,000 rows that I sync to ES with a simple mapping, no children nodes. If I update the table all at once (e.g., update mytable set x=x+1), the pgsync RSS spikes from 95MB to 3.0GB within seconds, and within a minute the OOM Killer shuts down pgsync.

[84675.994166] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=docker.service,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-8ed4d7f0dbab30e509c77e6a1ceaa1ea1c6e8262ad691c10a72ac0218e51d876.scope,task=pgsync,pid=2339,uid=1000
[84676.016132] Out of memory: Killed process 2339 (pgsync) total-vm:4378844kB, anon-rss:3435444kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:7036kB oom_score_adj:0

If I repeat that whole process but only update 300K of the 600K, RSS shoots to 1.6GB but everything runs. The redis queue gets worked through in a few minutes and the CPU stabilizes, but the RSS stays at 1.6GB indefinitely. I understand there are several non-psgync reasons why RSS may not decrease. What's causing problems is the almost immediate ballooning of RAM when the daemon receives a lot of updates.

In the comment above, I see "Once the initial sync is complete, memory usage should not continue to grow" but this is not at all the case for me, and apparently not for the OP either.