Open wemrysi opened 2 years ago
Still debugging some issues with restructuring the COPY
using flow sinks. May need to revert to something like the pre-flow implementation if the problem persists.
Perhaps rechunking the stream into larger chunks might help reduce the number of COPY
s? Perhaps help enough to counteract the performance penalty of rechunking.
Perhaps rechunking the stream into larger chunks might help reduce the number of
COPY
s? Perhaps help enough to counteract the performance penalty of rechunking.
Hm, yeah that might be enough, good idea. We're seeing 3MiB chunks on the problematic instance now, so maybe we try rechunking to 32MiB and see if that helps.
The per-chunk strategy appears to result in rather poor performance once data size increases. We'd like to revert to using a single
COPY
operation per-table to attempt to reclaim some performance. Some of the previous reliability concerns can be ameliorated via source buffering.In the case that we still run into reliability issues due to timeouts/long-running transactions a possible solution would be to define a maximum duration between writes to the
COPY
stream. If the threshold is reached, we commit the current operation and begin anew on the next chunk from upstream. This should avoid timeouts for slow sources while preserving performance where possible.