Closed tarzanek closed 1 week ago
fwiw we should be able to use above in other s3 cases too, ev. for DynamoDB too
PR #150 only fixed the issue for DynamoDB, not for Parquet files.
Re-posting my comment here:
I removed the changes related to Parquet since they require Hadoop 3.x. I think we can merge the PR as it improves the way we authenticate to AWS when migrating from DynamoDB, and we can re-apply the changes related to Parquet after we upgrade to Hadoop 3.x.
Originally posted by @julienrf in https://github.com/scylladb/scylla-migrator/issues/150#issuecomment-2163063798
Thanks @julienrf , please proceed with upgrading the Hadoop and Spark versions.
we should add an option to assume role for s3 access , which is defacto standard these days
It should be as easy as https://medium.com/@leythg/access-s3-using-pyspark-by-assuming-an-aws-role-9558dbef0b9e (of course rewritten to scala and proper input properties exposed in config file)