transferwise / pipelinewise-target-redshift

Singer.io Target for Amazon Redshift - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
Other
12 stars 65 forks source link

Add 'slices' configuration option #33

Closed Limess closed 4 years ago

Limess commented 4 years ago

This allows the user to configure the number of chunks that files will be loaded in. This should improve parallel loading.

It may be sensible to also add a minimum chunk size, and a maximum chunk size. The recommended minimum/maximum sizes are 1MB/256MB compressed, however I'm not sure how to best implement this automatically in a quick and sensible way, especially after also adding compression.

Reference: https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-use-multiple-files.html

koszti commented 4 years ago

This makes much sense and we'd like to add/merge the same functionalities as this one and #32 to target-snowflake.

Adding min/max chunk sizes makes lot of sense as well but like you said generating files dynamically is a bit tricky. I think for now defining static slices still beneficial and later we can think about how to use it together with a potential new chunk size parameter(s) and how to detect slices dynamically.

Please review the comments I added to the PR and I'm looking forward to merge it