It looks as though with the latest version of the velero-plugin-for-aws plugin is incorrectly utilizing IRSA. It looks like it is using the nodes attached role, rather than the role attached to the service account.
What did you expect to happen:
If an IRSA role is attached to the service account velero is using, I would expect it to use that role.
The following information will help us better understand what's going on:
Unable to provide a support bundle due to the sensitivity of this cluster. With that being said, hopefully this is enough information.
Errors:
time="2024-09-23T20:45:30Z" level=error msg="Current BackupStorageLocations available/unavailable/unknown: 0/1/0, BackupStorageLocation \"default\" is unavailable: rpc error: code = Unknown desc = operation error S3: ListObjectsV2, https response error StatusCode: 403, RequestID: TRUNCATED, HostID: TRUNCATED, api error AccessDenied: User: arn:aws:sts::TRUNCATED:assumed-role/TRUNCATED-worker/TRUNCATED is not authorized to perform: s3:ListBucket on resource: \"arn:aws:s3:::TRUNCATED-prod-velero\" because no identity-based policy allows the s3:ListBucket action)" controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:178"
time="2024-09-23T20:45:30Z" level=info msg="plugin process exited" backup-storage-location=velero/default cmd=/plugins/velero-plugin-for-aws controller=backup-storage-location id=200 logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugin-for-aws
time="2024-09-23T20:46:20Z" level=error msg="Error listing backups in backup store" backupLocation=velero/default controller=backup-sync error="rpc error: code = Unknown desc = operation error S3: ListObjectsV2, https response error StatusCode: 403, RequestID: TRUNCATED, HostID: TRUNCATED, api error AccessDenied: User: arn:aws:sts::TRUNCATED:assumed-role/TRUNCATED-worker/TRUNCATED is not authorized to perform: s3:ListBucket on resource: \"arn:aws:s3:::TRUNCATED-prod-velero\" because no identity-based policy allows the s3:ListBucket action" error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:351" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:109"
time="2024-09-23T20:46:20Z" level=info msg="plugin process exited" backupLocation=velero/default cmd=/plugins/velero-plugin-for-aws controller=backup-sync id=213 logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugin-for-aws
Given the above output, it looks like Velero is using the default role from IMDS/the ec2 worker role, not the IRSA role. Worth noting that prior to this version, we were on 1.10.x, and IRSA was working without issue. Looks like the switch to sdk-v2 has caused some issues.
Velero features (use velero client config get features): n/a
Kubernetes version (use kubectl version): 1.24 (though based on how IRSA works, don't think the older version should be an issue).
Kubernetes installer & version: EKS
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): n/a
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
:+1: for "I would like to see this bug fixed as soon as possible"
:-1: for "There are more important bugs to focus on right now"
What steps did you take and what happened:
It looks as though with the latest version of the
velero-plugin-for-aws
plugin is incorrectly utilizing IRSA. It looks like it is using the nodes attached role, rather than the role attached to the service account.What did you expect to happen:
If an IRSA role is attached to the service account velero is using, I would expect it to use that role.
The following information will help us better understand what's going on:
Unable to provide a support bundle due to the sensitivity of this cluster. With that being said, hopefully this is enough information.
Errors:
Helm chart configuration:
service account yaml, directly from the cluster, showing the appropriate annotation:
Given the above output, it looks like Velero is using the default role from IMDS/the ec2 worker role, not the IRSA role. Worth noting that prior to this version, we were on 1.10.x, and IRSA was working without issue. Looks like the switch to sdk-v2 has caused some issues.
May also be related to the following issues:
Environment:
velero version
): 1.14.1velero client config get features
): n/akubectl version
): 1.24 (though based on how IRSA works, don't think the older version should be an issue)./etc/os-release
): n/aVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.