toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.1k stars 172 forks source link

memory leak on long sync #473

Open accelq opened 11 months ago

accelq commented 11 months ago

PGSync version: 2.5.0

Postgres version: 14

Elasticsearch version: 8.8.2

Redis version: 7

Python version: 3.11.4

Problem Description: hi @toluaina I have a table with lakhs of entries. During profiling the sync process, I see the memory leak reaching to a state where it eats up whole 32 GB RAM shutting down the whole sync.

Error Message (if any):

toluaina commented 11 months ago
accelq commented 11 months ago

Yes initial sync, Overall Db is 150GB+

accelq commented 11 months ago

Couple of observations:

  1. When setting thread_count to 1 seems to reduce the rate at which memory is leaked.
  2. Even still I see parallel_bulk of elasticsearch creates extra threads. And this seems to be leaking memory as well If you look at the below image the virtual memory footprint keeps on increasing of each thread created by elasticsearch lib image
gustavorps commented 9 months ago

Any success with the initial sync @accelq? I have a similar case with 1TB of data