toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.1k stars 172 forks source link

Huge dataset fails, and on restart starts from scratch again. #469

Closed accelq closed 11 months ago

accelq commented 11 months ago

PGSync version: Latest

Postgres version: 14

Elasticsearch version: 8+

Redis version: 7+

Python version: 3.11.4

Problem Description: Hi @toluaina, so I have a huge dataset 6 lakh entries, with the SQL joins it takes a lot of time to sync and I've set QUERY_SIZE to 10,000. But it fails after 1lakh syncs. As the system goes down completely.

I see htop command output to be using full memory 32GB of the python process. Is there a way to have sync in chunks like first 1Lakh entries then next and so on ???

Error Message (if any):

Python process is killed.
accelq commented 11 months ago

Closing this seems like elastic search has issues, its failing