kubernetes-sigs / aws-ebs-csi-driver

CSI driver for Amazon EBS https://aws.amazon.com/ebs/
Apache License 2.0
974 stars 786 forks source link

Could not create PVC from VolumeSnapshot due to lack of IAM permission for ebs-csi-controller #2151

Closed zoli-opslogic closed 1 week ago

zoli-opslogic commented 2 weeks ago

/kind bug

What happened? Using this procedure to import and restore a previously created AWS EBS snapshot. (not sure if relevant, but the snapshot was created with this procedure in another namespace on the same EKS cluster, works perfectly) My storageclass resource and volumesnapshotclass resource looks like this:

kind: StorageClass
metadata:
  name: encrypted-gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: 'true'
provisioner: ebs.csi.aws.com
parameters:
  encrypted: 'true'
  fsType: ext4
  type: gp3
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
kind: VolumeSnapshotClass
metadata:
  name: ebs-csi-aws
  annotations:
    snapshot.storage.kubernetes.io/is-default-class: "true"
driver: ebs.csi.aws.com
deletionPolicy: Delete

When creating a PVC with source from a VolumeSnapshot I get the following error event and the PVC remains in a Pending state forever : E0915 10:31:11.191835 1 driver.go:108] "GRPC error" err="rpc error: code = Internal desc = Could not create volume \"pvc-321ad319-b39d-654a-9710-406543984532\": could not create volume in EC2: operation error EC2: CreateVolume, https response error StatusCode: 403, RequestID: b3b4128b-6caa-49df-900d-762d342d0901, api error UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:sts::<AWS ACC NR>:assumed-role/AmazonEKS_EBS_CSI_DriverRoleTesting/1321496256754713177 is not authorized to perform: ec2:CreateVolume on resource: arn:aws:ec2:us-east-2::snapshot/snap-001sdfda734438e64 because no identity-based policy allows the ec2:CreateVolume action. Encoded authorization failure message: ...."

Comment for the above situation: VolumeSnapshotContent is pointing to a specifc AWS EBS snapshot handle and VolumeSnapshot previously created and in ReadyToUse state. The PVC error events can be seen on the logs from ebs-csi-controller pod -> ebs-plugin container. Yes, the WaitForFirstConsumer directive is also reconciled with scaling up the Statefulset that spins up the pod that uses it.

The EBS CSI driver is installed via EKS addons - Amazon EBS CSI Driver -v1.34.0-eksbuild.1. It uses IRSA - arn:aws:iam::<AWS ACC NR>::role/AmazonEKS_EBS_CSI_DriverRoleTesting with the AWS maintained AmazonEBSCSIDriverPolicy - Version 2 attached policy. There is no issue if I create a PVC without referencing VolumeSnapshot. Double-checked the SA is correctly configured using the procedure from https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html#csi-iam-role Tried different solutions, restarted ebs-csi-controller, recycled eks all eks nodes, added additional tagging via storageclass, etc..

The only thing that worked, was adding explicitly a policy to the arn:aws:iam::<AWS ACC NR>::role/AmazonEKS_EBS_CSI_DriverRoleTesting that allows EC2:CreateVolume without conditions.

    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateVolume"
            ],
            "Resource": "*"
        }
    ]
}

Although this above workaround solves my issue, this should work with the rules in place in the attached AmazonEBSCSIDriverPolicy - Version 2 policy, as all the official documentation presents it. See below an excerpt from the policy to highlight lines related to ec2:CreateVolume:

..................
     {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateVolume"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "aws:RequestTag/ebs.csi.aws.com/cluster": "true"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateVolume"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "aws:RequestTag/CSIVolumeName": "*"
                }
            }
        },
.....................

What you expected to happen? To be able to create PVCs referencing VolumeSnapshots.

How to reproduce it (as minimally and precisely as possible)? Explained in the What happened? section.

Anything else we need to know?:

Environment AWS EKS

torredil commented 2 weeks ago

Can you please double check and make sure the AmazonEBSCSIDriverPolicy managed policy is actually attached to your role? this looks like a misconfiguration.

I just went through the following steps:

  1. Create role and attach managed policy:

    eksctl create iamserviceaccount --cluster dev --namespace kube-system --name ebs-csi-controller-sa --approve --role-only --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy --region us-east-1 --role-name EBSCSIDriverRole
  2. Retrieve role ARN:

    ROLE_ARN=$(aws iam get-role --query 'Role.Arn' --output text --role-name EBSCSIDriverRole)
  3. Install snapshot controller:

    eksctl create addon --name snapshot-controller --cluster dev --region us-east-1
  4. Install driver:

    eksctl create addon --name aws-ebs-csi-driver --cluster dev --service-account-role-arn $ROLE_ARN --region us-east-1

and was able to successfully create a volume from a snapshot using the example manifests without having to modify the managed policy. I also enabled SDK logs to confirm that the driver correctly added the necessary tags in the CreateVolume request.

torredil commented 2 weeks ago

If you continue to run into auth issues, taking a look at the decoded version of the authorization failure will help with debugging.

zoli-opslogic commented 2 weeks ago

Can you please double check and make sure the AmazonEBSCSIDriverPolicy managed policy is actually attached to your role? this looks like a misconfiguration.

I just went through the following steps:

1. Create role and attach managed policy:
eksctl create iamserviceaccount --cluster dev --namespace kube-system --name ebs-csi-controller-sa --approve --role-only --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy --region us-east-1 --role-name EBSCSIDriverRole
2. Retrieve role ARN:
ROLE_ARN=$(aws iam get-role --query 'Role.Arn' --output text --role-name EBSCSIDriverRole)
3. Install snapshot controller:
eksctl create addon --name snapshot-controller --cluster dev --region us-east-1
4. Install driver:
eksctl create addon --name aws-ebs-csi-driver --cluster dev --service-account-role-arn $ROLE_ARN --region us-east-1

and was able to successfully create a volume from a snapshot using the example manifests without having to modify the managed policy. I also enabled SDK logs to confirm that the driver correctly added the necessary tags in the CreateVolume request.

Oh, there is a snapshot-controller EKS addon, nice. At the time I started implementing this solution the official AWS docu was pointing to installing this external snapshotter. I tested to uninstall the CRDs together with the external snapshot controller and install the snapshot-controller EKS addon but it seems it does not install the CRDs and I also had to create a VolumeSnapshotClass in order to be able to use VolumeSnapshots and VolumeSnapshotContents. Is that accurate, you had those already in place when you ran your tests? Having the CRDs installed separately from here and the VolumeSnapshotClass works and I do not get the IAM error I was getting before (yay!) Can you confirm that this setup is ok? I mean, mostly that the snapshot-controller EKS addon does not install the required CRDs and configure a default VolumeSnapshotClass (these have to be done separately like I did) or am I still missing something? Thank you!

torredil commented 2 weeks ago

Having the CRDs installed separately from here and the VolumeSnapshotClass works and I do not get the IAM error I was getting before (yay!)

Glad to hear 👍

Can you confirm that this setup is ok? I mean, mostly that the snapshot-controller EKS addon does not install the required CRDs and configure a default VolumeSnapshotClass (these have to be done separately like I did) or am I still missing something?

The snapshot-controller EKS addon installs everything needed for snapshots to work - including the necessary CRDs - but you do need to manually create the VolumeSnapshotClass as its configuration varies based on user requirements.

zoli-opslogic commented 1 week ago

Re-tested this and indeed the EKS addon installs also the required CRDs. Manually created the VolumeSnapshotClass, all seems fine. Thank you for your answers, I will close this.

zoli-opslogic commented 1 week ago

Issue non existent with snapshot-controller EKS addon