spark-redshift-community / spark-redshift

Performant Redshift data source for Apache Spark
Apache License 2.0
137 stars 63 forks source link

Use a dated s3temp folder name instead of using uuid #145

Closed dongfeng3692 closed 1 year ago

dongfeng3692 commented 1 year ago

Sometimes I may want to use historical s3 files to restore redshift's historical data, but uuid makes it difficult for me to quickly find the version I want。

smoy commented 1 year ago

the uuid is to avoid collision (multiple runs in the cloud). You should already be able to set a s3 prefix, you can change it to yyyy-mm-dd/ and the prefix is going to up to be yyyy-mm-dd/

bsharifi commented 1 year ago

Thank you @smoy for the suggestion. Yes, a dated prefix can be added to the tempdir parameter like so and the uuid will be appended after it:

.option("tempdir", "s3n://mybucket/yyyy-mm-dd")