thelastpickle / cassandra-medusa

Apache Cassandra Backup and Restore Tool
Apache License 2.0
255 stars 135 forks source link

AWS Access Key Id not match using aws iam user #753

Open dxu-sfx opened 2 months ago

dxu-sfx commented 2 months ago

Project board link

Hello there,

I am having an issue in using iam user & role properly. folliwng https://github.com/thelastpickle/cassandra-medusa/blob/edb76efd6078715a6311e24e1a1fd08641e92810/docs/aws_s3_setup.md#create-an-aws-iam-role-or-aws-iam-user-for-backups

Here is the medusa container, where I configured the s3 and key files , medusa standalone is having same config as this containers

    [cassandra]
    use_sudo = false

    [storage]
    use_sudo_for_restore = false
    storage_provider = s3
    bucket_name = o11y-k8ssandra-medusa
    key_file = /etc/medusa-secrets/credentials
    prefix = demo
    max_backup_age = 0
    max_backup_count = 0
    region = us-west-2
    secure = False
    ssl_verify = False
    transfer_max_bandwidth = 50MB/s
    concurrent_transfers = 1
    multi_part_upload_threshold = 104857600

    [grpc]
    enabled = 1

    [logging]
    level = DEBUG

keyid and key is created and match with what I have in the S3 user

 cat /etc/medusa-secrets/credentials
[default]
aws_access_key_id = XXX
aws_secret_access_key = XXX

This is my medus yaml file.

  medusa:
    storageProperties:
      # Can be either of google_storage, azure_blobs, s3, s3_compatible, s3_rgw or ibm_storage 
      storageProvider: s3
      storageSecretRef:  ## used workaround in it 
        name: medusa-bucket-key
      bucketName: o11y-k8ssandra-medusa
      # # Prefix for this cluster in the storage bucket directory structure, used for multitenancy
      # prefix: test
      # Whether or not to use SSL to connect to the storage backend
      secure: false 
      region: us-west-2

It seems I can connect to S3, but ever since it is trying to upload file, it is throwing issue to me

[2024-04-24 21:41:10,383] DEBUG: [S3 Storage] Uploading object from stream -> s3://o11y-k8ssandra-medusa/demo/demo-dc1-rack1-sts-0/medusa-backup0424/meta/schema.cql
[2024-04-24 21:41:10,394] ERROR: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.

I can leverage my script and it can properly talk to s3 with my S3 bucket which works, However, what could be the issue if this is throwing the issue to me if medusa is running the same process itself?

Can you reproduce this or do you have any clue how should I debug this?

dxu-sfx commented 2 months ago

THe error is

[2024-04-24 23:06:23,294] DEBUG: [Storage] Getting object demo/demo-dc1-rack1-sts-0/test/meta/schema.cql
[2024-04-24 23:06:23,295] DEBUG: Using selector: GeventSelector
--- Logging error ---
Traceback (most recent call last):
  File "/home/cassandra/medusa/storage/s3_base_storage.py", line 326, in _stat_blob
    resp = self.s3_client.head_object(Bucket=self.bucket_name, Key=object_key)
  File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/cassandra/.venv/lib/python3.10/site-packages/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

During handling of the above exception, another exception occurred:
...
Arguments: (ClientError('An error occurred (403) when calling the HeadObject operation: Forbidden'),)
[2024-04-24 23:06:23,369] ERROR: Error getting object from s3://o11y-k8ssandra-medusa/demo/demo-dc1-rack1-sts-0/test/meta/schema.cql
[2024-04-24 23:06:23,369] INFO: Starting backup using Stagger: None Mode: differential Name: test
[2024-04-24 23:06:23,369] DEBUG: Updated from existing status: -1 to new status: 0 for backup id: test
[2024-04-24 23:06:23,370] DEBUG: Process psutil.Process(pid=670, name='medusa', status='running', started='23:06:21') was set to use only idle IO and CPU resources
[2024-04-24 23:06:23,370] INFO: Saving tokenmap and schema
[2024-04-24 23:06:23,627] DEBUG: Checking placement using dc and rack...
[2024-04-24 23:06:23,627] INFO: Resolving ip address 10.124.38.28
[2024-04-24 23:06:23,628] INFO: ip address to resolve 10.124.38.28
[2024-04-24 23:06:23,630] DEBUG: Resolved 10.124.38.28 to demo-dc1-rack1-sts-0
[2024-04-24 23:06:23,630] DEBUG: Checking host 10.124.38.28 against 10.124.38.28/demo-dc1-rack1-sts-0
[2024-04-24 23:06:23,631] INFO: Resolving ip address 10.124.168.112
[2024-04-24 23:06:23,631] INFO: ip address to resolve 10.124.168.112
[2024-04-24 23:06:23,635] DEBUG: Resolved 10.124.168.112 to demo-dc1-rack3-sts-0
[2024-04-24 23:06:23,635] INFO: Resolving ip address 10.124.38.28
[2024-04-24 23:06:23,635] INFO: ip address to resolve 10.124.38.28
[2024-04-24 23:06:23,637] DEBUG: Resolved 10.124.38.28 to demo-dc1-rack1-sts-0
[2024-04-24 23:06:23,637] INFO: Resolving ip address 10.124.76.56
[2024-04-24 23:06:23,638] INFO: ip address to resolve 10.124.76.56
[2024-04-24 23:06:23,640] DEBUG: Resolved 10.124.76.56 to demo-dc1-rack2-sts-0
[2024-04-24 23:06:23,700] DEBUG: [S3 Storage] Uploading object from stream -> s3://o11y-k8ssandra-medusa/demo/demo-dc1-rack1-sts-0/test/meta/schema.cql
[2024-04-24 23:06:23,711] ERROR: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
dxu-sfx commented 2 months ago

we've noticed every time it generates a new access key while making the connection, not the one in the config. the config is being read, but the one in the actual connection is not the same.

dxu-sfx commented 2 months ago

Finally, we are able to see the issue, if we are using iam role, everytime, medus is using a temporary role keyid for the connection, it will skip reading the /etc/medusa-secrets/credentials . 2nd ,we are seeing the issue with botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden from the setup persrptvice.

However, if we are using iam user, we have to remove SA setup, so it will take default /etc/medusa-secrets/credentials . and do the backups without issue.s

rzvoncek commented 1 month ago

Hello @dxu-sfx !

It has been some time since we had an issue with this, so I'm a bit rusty on this topic.

Just like the documentation says, you first need to create an IAM Policy to declare what permissions should be granted. Then you have two options - assign this policy to a role or to a user.

The user aproach seems to be what you're already doing. You create the user, attach the policy to it, generate credentials for the user, place them on the node and reference them in the config file.

The idea behind the role is that you can skip a bunch of this. What you do is configure the instance itself (or the container) to assume this role. This means the instance will implicitly run with the permissions of this role. Then, in Medusa the boto library we use for interaction with s3, will first look for the credentials file. Then there might be few other authentication methods it tires, but if nothing works, it'll try to query the AWS metadata API to work out the roles (and temporary credentials). If it finds out the role is assumed, it'll authenticate and proceed.

So, in conclusion, please check if you have the assume role thing set up, and try removing the credentials from the config (and the file system).

rzvoncek commented 1 month ago

Hello @dxu-sfx ! Did you manage to work this out? Is there something more we can help with?