toluaina / pgsync

Postgres to Elasticsearch/OpenSearch sync
https://pgsync.com
MIT License
1.11k stars 174 forks source link

Use STREAM_RESULTS in call to execute #383

Closed loren closed 1 year ago

loren commented 1 year ago

I saw some evidence via Fil that memory usage spikes when truncating a slot with many txs in it.

I think under the hood, a call to execute without stream_results will cause psycopg2 to pull down results and instantiate them in memory even though they are unused.

With this change in place, I was able to process a bulk update that failed 100% of the time. Instead of RSS spiking to 3.0GB, it stayed constant at 110MB.

It's very difficult to replicate this, and I'm not completely confident this fixes the issue entirely. I almost wonder if the safer thing to do is have pg_logical_slot_get_changes loop through blocks of say LOGICAL_SLOT_CHUNK_SIZE so we're guaranteed not to ever return too large a payload.

toluaina commented 1 year ago

It's very difficult to replicate this, and I'm not completely confident this fixes the issue entirely. I almost wonder if the safer thing to do is have pg_logical_slot_get_changes loop through blocks of say LOGICAL_SLOT_CHUNK_SIZE so we're guaranteed not to ever return too large a payload.

I think this is a good idea too. I will do this