Open pdbossman opened 3 months ago
FYI - my credentials are set in:
~/.aws/credentials
I also tried exporting to environment variables
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_KEY=...
As well as specifying in config.yaml. When I specify in config.yaml, it gives a different error:
Exception in thread "main" com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The security token included in the request is invalid. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: UnrecognizedClientException; Request ID: G9FMU2QPUE87VAJ0BEA230KPKRVV4KQNSO5AEMVJF66Q9ASUAAJG)
This seems to be requiring instance profile rather than using config file. I don't have permissions to change the instance profile. I'm not sure how to change the code to get it to use the default chain, but I think that would do the trick. As it would then walk through different/common methods of providing auths.
@pdbossman Thank you for the detailed report. I confirm that I reproduced the issue when the credentials are provided via ~/.aws/credentials
on the Spark worker node. It seems they are picked up by the master node, but not by the worker node… I will investigate further.
However, I could not reproduce your problem when the credentials are provided in the config.yaml
file. In such a case, the migration runs fine for me. Would you mind sharing your whole config.yaml
(without the credentials section)? Did you also try to remove the ~/.aws
credentials from the Spark master and worker nodes when the credentials were provided by the config.yaml
file?
Hi Julien, I may need to walk Lubos through this part and have him work with you. We're using Okta and gimme-aws-creds and it produces the aws credentials file which has the following components: [default] aws_access_key_id =... aws_secret_access_key =... aws_session_token =... x_security_token_expires =...
The generated credentials expire. I was providing the access key and secret access key from the generated security credentials, but it's clear to me now they cannot be used in that way and are tied to the token.
So when I normally run, and on the previous version of scylla-migrator, the source credentials are completely commented out. After running gimme-aws-creds, I would run aws configure, and the access key and secret key were pre-filled in from what gimme-aws-creds created, I only really had to run it to have it set the region. Then I didn't need to provide anything in the yaml file at all from the source except type, table name, and scanSegments.
Basically, I need this to work without providing credentials in the yaml file at all. If we need to have a quick meeting Monday, let me know.
Actually, to be clear - I think if you just fix the aws/credentials file usage for worker, you'll have solved the problem.
@pdbossman I was able to use the AWS profile credentials with the following change: https://github.com/julienrf/scylla-migrator/tree/aws-credentials Could please let me know if that fixes your issue?
@pdbossman I was able to use the AWS profile credentials with the following change: https://github.com/julienrf/scylla-migrator/tree/aws-credentials Could please let me know if that fixes your issue?
It does! Thank you!
so we need it configurable
so people can also go with
~/.aws/credentials
on workers, resp. there is ONE more way, which is IAM role on top of VM with workers itself (which I'd expect is default in EMR)
fwiw for our access @pdbossman we use assumed role, so we will need support for something like this: https://stackoverflow.com/questions/44316061/does-spark-allow-to-use-amazon-assumed-role-and-sts-temporary-credentials-for-dy
but let's go step by step, let's fix current access I will merge https://github.com/scylladb/scylla-migrator/pull/123 @julienrf , can you only amend a ref to this issue?
next step would be to make it configurable, so basically it will either go down hierarchy until it finds credentials options: metadata token credentials in .aws/credentials of the user that runs executor (and masters) assumed role (as per what Patrick is doing) and old (likely unsecure) way of just access key and secret
com.amazonaws.auth.InstanceProfileCredentialsProvider seems to have a chain already, so let's see how we can use it / optimize it / configure it
Attempted to migrate from DynamoDB
I ran aws configure, and from the master and workers, I am able to list DynamoDB tables: Source dynamodb: aws dynamodb list-tables { "TableNames": [ "monitoring", "redacted-table-name-here", "tfstate-locks" ] }
target scylla (I have a /etc/hosts assigning scylla hostname to proper ip): aws dynamodb list-tables --endpoint-url "http://scylla:8000" { "TableNames": [ "redacted-table-name-here" ] }
When I run spark-submit, it's hung looking for security credentials.
@hopugop @tarzanek @erezvelan