transferwise / pipelinewise-target-redshift

Singer.io Target for Amazon Redshift - PipelineWise compatible
https://transferwise.github.io/pipelinewise/
Other
12 stars 65 forks source link

Support gzip and bz2 compression #32

Closed Limess closed 4 years ago

Limess commented 4 years ago

This change adds the .bz2 or .gzip file extension to files in S3 and locally, and uses the GZIP or BZ2 directive in the COPY command.

Configurable by the 'compression' parameter.

See https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-file-compression.html#copy-bzip2 for reference.

Stitch ran a benchmark of the performance of different compression formats a few years ago: https://www.stitchdata.com/blog/redshift-database-benchmarks-copy-performance-with-compressed-files/. This suggested that gzip/bz2 was a net positive over a certain filesize, but both are worse than LZO for small-medium file sizes, and both worse than zero compression for small file sizes.

koszti commented 4 years ago

@Limess thanks for this, this is really great. However I found something that's not allowing me to run it. I commented at the right place and the fix looks straightforward. Can you please have a look and update the PR?

I'm looking forward to merge it.

Update: Seems like the same problem in #33

Limess commented 4 years ago

Ah sorry about that, I'd extracted the merged changes from a fork into two separate changes and forgot about updating the config handling. I'll fix that tomorrow on both PRs if I have time.

koszti commented 4 years ago

thank you. Now it's working but has some conflicts after #33 got merged. Can you please help resolving them?

Limess commented 4 years ago

I've rebased this branch.