hellonarrativ / spectrify

Export Redshift data and convert to Parquet for use with Redshift Spectrum or other data warehouses.
https://aws.amazon.com/blogs/big-data/narrativ-is-helping-producers-monetize-their-digital-content-with-amazon-redshift/
MIT License
116 stars 25 forks source link

New CSV transform feature #42

Open grantnicholas opened 5 years ago

grantnicholas commented 5 years ago

Are you open to PRs?

I added in support to map multiple CSV files into a single parquet file in order to increase row group compression and decrease spectrum query times.

While mapping a single CSV to a single parquet file is a fine default for unloading full tables, for unloading partitions it tended to produce too many small parquet files.

c-nichols commented 5 years ago

Definitely open to PRs! This is something I thought about adding but punted on because it wasn't critical for our use.

grantatspothero commented 5 years ago

Awesome, I just opened PR #43 to add the feature.