hellonarrativ / spectrify

Export Redshift data and convert to Parquet for use with Redshift Spectrum or other data warehouses.
https://aws.amazon.com/blogs/big-data/narrativ-is-helping-producers-monetize-their-digital-content-with-amazon-redshift/
MIT License
116 stars 25 forks source link

Why 1gb files? #29

Closed paoliniluis closed 6 years ago

paoliniluis commented 6 years ago

Hey! Great tool. Why are you using 1gb files instead of smaller ones? Like 100mb... Smaller files should be better to get more speed in queries since they get more parallelism

c-nichols commented 6 years ago

Yes, that is an artifact from early versions. 1GB files worked well originally for our particular data size/partitioning strategy. AWS does recommend 256MB files — but Spectrum will now maximize parallelism with any sized data files!

If you’d like to open a PR to move the default to 256MB, that would be much appreciated!

paoliniluis commented 6 years ago

Done, check PR https://github.com/hellonarrativ/spectrify/pull/31

c-nichols commented 6 years ago

Thank you @paoliniluis !