Closed Limess closed 4 years ago
@Limess thanks for this, this is really great. However I found something that's not allowing me to run it. I commented at the right place and the fix looks straightforward. Can you please have a look and update the PR?
I'm looking forward to merge it.
Update: Seems like the same problem in #33
Ah sorry about that, I'd extracted the merged changes from a fork into two separate changes and forgot about updating the config handling. I'll fix that tomorrow on both PRs if I have time.
thank you. Now it's working but has some conflicts after #33 got merged. Can you please help resolving them?
I've rebased this branch.
This change adds the .bz2 or .gzip file extension to files in S3 and locally, and uses the GZIP or BZ2 directive in the COPY command.
Configurable by the 'compression' parameter.
See https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-file-compression.html#copy-bzip2 for reference.
Stitch ran a benchmark of the performance of different compression formats a few years ago: https://www.stitchdata.com/blog/redshift-database-benchmarks-copy-performance-with-compressed-files/. This suggested that gzip/bz2 was a net positive over a certain filesize, but both are worse than LZO for small-medium file sizes, and both worse than zero compression for small file sizes.