precog / quasar-destination-postgres

Quasar plugin providing the ability to push results into a Postgres database
0 stars 10 forks source link

Revert to a single COPY operation per table instead of per chunk #308

Open wemrysi opened 2 years ago

wemrysi commented 2 years ago

The per-chunk strategy appears to result in rather poor performance once data size increases. We'd like to revert to using a single COPY operation per-table to attempt to reclaim some performance. Some of the previous reliability concerns can be ameliorated via source buffering.

In the case that we still run into reliability issues due to timeouts/long-running transactions a possible solution would be to define a maximum duration between writes to the COPY stream. If the threshold is reached, we commit the current operation and begin anew on the next chunk from upstream. This should avoid timeouts for slow sources while preserving performance where possible.

wemrysi commented 2 years ago
<1>
wemrysi commented 2 years ago

Still debugging some issues with restructuring the COPY using flow sinks. May need to revert to something like the pre-flow implementation if the problem persists.

jsantos17 commented 2 years ago

Perhaps rechunking the stream into larger chunks might help reduce the number of COPYs? Perhaps help enough to counteract the performance penalty of rechunking.

wemrysi commented 2 years ago

Perhaps rechunking the stream into larger chunks might help reduce the number of COPYs? Perhaps help enough to counteract the performance penalty of rechunking.

Hm, yeah that might be enough, good idea. We're seeing 3MiB chunks on the problematic instance now, so maybe we try rechunking to 32MiB and see if that helps.