spark-redshift-community / spark-redshift

Performant Redshift data source for Apache Spark
Apache License 2.0
135 stars 62 forks source link

Feature request unload as parquet file #116

Open parisni opened 1 year ago

parisni commented 1 year ago

Parquet format is up to 2x faster to unload and consumes up to 6x less storage in Amazon S3, compared with text formats. See https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

Currently CSV is used to unload which is inneficient. I can't see a valid reason to keep using CSV and maintain custom code to transform it to rdd / dataframe See https://github.com/spark-redshift-community/spark-redshift/blob/master/src/main/scala/io/github/spark_redshift_community/spark/redshift/RedshiftRelation.scala#L198