toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.1k stars 172 forks source link

Initial Load Memory Grows Exponentially? #503

Open bxcodec opened 8 months ago

bxcodec commented 8 months ago

PGSync version: master branch commit: 95116702c4b314d8b97696ef857cfe116241e236 Postgres version: 14.5 Elasticsearch version: Latest version of Opensearch Redis version: Latest Python version: 3.8

Problem Description:

Hi, I tried to run this in my local.

I have this env set

ELASTICSEARCH_PORT=9200
ELASTICSEARCH_SCHEME=http
ELASTICSEARCH_HOST=opensearch
ELASTICSEARCH=false
OPENSEARCH=true
ELASTICSEARCH_CHUNK_SIZE=1000
QUERY_CHUNK_SIZE=10000

Qns

Why does the memory grow exponentially on my initial load? Is there any way to make the memory consumption stable?

Screenshot 2023-11-06 at 5 23 54 PM

I have tried to lower the number of the chunk size, e.g., to even 1K and 500, but I am still facing the same issue on the first initial load (?)

Error Message (if any): Because of OOM, it killed the application

Killed
exited with code 137
accelq commented 8 months ago

The query_chunk_size is not on the Postgres cursor. So SQL Alchemy keeps on pulling the data until it can. https://github.com/toluaina/pgsync/blob/95116702c4b314d8b97696ef857cfe116241e236/pgsync/base.py#L867

You can add a new env variable to control it https://github.com/accelq/pgsync/blob/f1d7caa95cf8edb30da03e05172e90bf7775b666/pgsync/base.py#L869

This worked for me though.

bxcodec commented 8 months ago

Any plan to add this functionality as a core feature? cc @toluaina

sergiojgm commented 7 months ago

I found after debugging with 26MM of records on initial load, If you use es parallel bulk(default)(stream it will exit in case of error) in case of failures/exceptions(ex: different structure potentially on json type between records) with the options ELASTICSEARCH_RAISE_ON_ERROR and ELASTICSEARCH_RAISE_ON_EXCEPTION(both true by default), all errors will make the the failed record resident in memory due unhandled error/exception, continue to process, and pile up until has out of memory. Disable both options and memory will be stable, also review the data structure matches on all records on what is being created on ES. Hope this helps ;) I was able to copy the 26MM records, 250G to es in 3h35m with max usage of 1.6G of ram.