Closed idanme-tr closed 4 weeks ago
@idanme-tr, can you share the permissions you have applied? And also your BSL configuration - does it have the storageAccountUri.
Also I would recommend checking if you have any leftover of PodIdentity in your cluster. That often leads to issues
Hey, Thanks for the reply. We did have pod identity installed, I've deleted it and still had a few different errors so I've reconfigured everything on a new cluster that never had pod identity deployed to it.
The new identity has a few roles assigned to it on different levels. Storage account level - Reader
Resource group level - Reader Contributor Storage blob data contributor Velero custom role based on the documentation.
Current configurations -
velero:
backupsEnabled: true
snapshotsEnabled: false
configuration:
backupStorageLocation:
- name: default
provider: velero.io/azure
bucket: int-omrizi-upgrades-01-we
config:
storageAccount: intaksvelerobackups
resourceGroup: int-aks-velero-backups-rg
activeDirectoryAuthorityURI: https://login.microsoftonline.com/
useAAD: "true"
credentials:
secretContents:
cloud: |
AZURE_SUBSCRIPTION_ID=c5e7a9f2-8220-4dbd-8a43-545c473a8fda
AZURE_RESOURCE_GROUP=MC_int-omrizi-upgrades-01-we-rg_int-omrizi-upgrades-01-we_westeurope
AZURE_CLOUD_NAME=AzurePublicCloud
serviceAccount:
server:
create: true
name: "int-omrizi-upgrades-01-we-velero-sa"
annotations:
azure.workload.identity/client-id: ************
podLabels:
azure.workload.identity/use: "true"
schedules:
daily:
schedule: "0 2 * * *"
template:
ttl: 336h0m0s # Set TTL for backups (14 days)
includedNamespaces: []
excludedNamespaces:
- kube-system
- monitoring
- twistlock
- cloudhiro
- keda
storageLocation: default
initContainers:
- name: velero-plugin-for-microsoft-azure
image: velero/velero-plugin-for-microsoft-azure:v1.10.1
volumeMounts:
- mountPath: /target
name: plugins
rbac:
create: true
clusterAdministrator: true
clusterAdministratorName: cluster-admin
resources:
requests:
cpu: 500m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
upgradeJobResources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 100m
memory: 256Mi
deployNodeAgent: true
nodeAgent:
priorityClassName: "system-node-critical"
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1024Mi
Logs -
velero time="2024-10-21T15:19:08Z" level=info msg="failed to retrieve the storage account properties: ManagedIdentityCredential: ManagedIdentityCredential: Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=&resource=https%3A%2F%2Fmanagement.core.windows.net%2F\": context deadline exceeded, fallback to use the default URI \"https://intaksvelerobackups.blob.core.windows.net\"" backup-storage-location=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-storage-location logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.14.1/pkg/util/azure/storage.go:208" pluginName=velero-plugin-for-microsoft-azure
velero time="2024-10-21T15:19:08Z" level=info msg="auth with Azure AD" backup-storage-location=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-storage-location logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.14.1/pkg/util/azure/storage.go:114" pluginName=velero-plugin-for-microsoft-azure
velero time="2024-10-21T15:19:08Z" level=info msg="Validating BackupStorageLocation" backup-storage-location=velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:141"
velero time="2024-10-21T15:19:15Z" level=info msg="failed to retrieve the storage account properties: ManagedIdentityCredential: ManagedIdentityCredential: Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=&resource=https%3A%2F%2Fmanagement.core.windows.net%2F\": context deadline exceeded, fallback to use the default URI \"https://intaksvelerobackups.blob.core.windows.net\"" backupLocation=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-sync logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.14.1/pkg/util/azure/storage.go:208" pluginName=velero-plugin-for-microsoft-azurevelero time="2024-10-21T15:19:15Z" level=info msg="auth with Azure AD" backupLocation=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-sync logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.14.1/pkg/util/azure/storage.go:114" pluginName=velero-plugin-for-microsoft-azure
When adding the URI directly into the BSL.
velero time="2024-10-21T16:07:46Z" level=info msg="the storage account URI \"https://intaksvelerobackups.blob.core.windows.net\" is specified in the BSL, use it directly" backup-storage-location=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-storage-location logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.14.1/pkg/util/azure/storage.go:171" pluginName=velero-plugin-for-microsoft-azure
velero time="2024-10-21T16:07:46Z" level=info msg="auth with Azure AD" backup-storage-location=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-storage-location logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.14.1/pkg/util/azure/storage.go:114" pluginName=velero-plugin-for-microsoft-azure
velero time="2024-10-21T16:07:46Z" level=info msg="Validating BackupStorageLocation" backup-storage-location=velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:141"
velero time="2024-10-21T16:08:00Z" level=error msg="Error listing backups in backup store" backupLocation=velero/default controller=backup-sync error="rpc error: code = Unknown desc = ManagedIdentityCredential: ManagedIdentityCredential: Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=&resource=https%3A%2F%2Fstorage.azure.com\": context deadline exceeded" logSource="pkg/controller/backup_sync_controller.go:109"
velero time="2024-10-21T16:08:00Z" level=info msg="plugin process exited" backupLocation=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-sync id=206 logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugin-for-microsoft-azure
velero time="2024-10-21T16:08:00Z" level=info msg="the storage account URI \"https://intaksvelerobackups.blob.core.windows.net\" is specified in the BSL, use it directly" backupLocation=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-sync logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.14.1/pkg/util/azure/storage.go:171" pluginName=velero-plugin-for-microsoft-azure
velero time="2024-10-21T16:08:00Z" level=info msg="auth with Azure AD" backupLocation=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-sync logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.14.1/pkg/util/azure/storage.go:114" pluginName=velero-plugin-for-microsoft-azure
velero time="2024-10-21T16:15:36Z" level=error msg="fail to validate backup store" backup-storage-location=velero/default controller=backup-storage-location error="rpc error: code = Unknown desc = ManagedIdentityCredential: ManagedIdentityCredential: Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=&resource=https%3A%2F%2Fstorage.azure.com\": context deadline exceeded" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store.go:206" error.function="github.com/vmware-tanzu/velero/pkg/persistence.(*objectBackupStore).IsValid" logSource="pkg/controller/backup_storage_location_controller.go:144"
velero time="2024-10-21T16:15:36Z" level=info msg="BackupStorageLocation is invalid, marking as unavailable" backup-storage-location=velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:120"
This is the BSL configurations -
apiVersion: v1
items:
- apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
annotations:
meta.helm.sh/release-name: velero
meta.helm.sh/release-namespace: velero
creationTimestamp: "2024-10-21T15:28:26Z"
generation: 8
labels:
app.kubernetes.io/instance: velero
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: velero
helm.sh/chart: velero-7.2.1
name: default
namespace: velero
resourceVersion: "55720075"
uid: 926c3b55-abc7-4362-a977-21fec5791cc8
spec:
accessMode: ReadWrite
config:
activeDirectoryAuthorityURI: https://login.microsoftonline.com/
resourceGroup: int-aks-velero-backups-rg
storageAccount: intaksvelerobackups
storageAccountURI: https://intaksvelerobackups.blob.core.windows.net
useAAD: "true"
default: true
objectStorage:
bucket: int-omrizi-upgrades-01-we
provider: velero.io/azure
status:
lastValidationTime: "2024-10-21T16:15:36Z"
message: 'BackupStorageLocation "default" is unavailable: rpc error: code = Unknown
desc = ManagedIdentityCredential: ManagedIdentityCredential: Get "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=&resource=https%3A%2F%2Fstorage.azure.com":
context deadline exceeded'
phase: Unavailable
kind: List
metadata:
resourceVersion: ""
I am attaching another debug file. bundle-2024-10-21-19-24-33.tar.gz
Thanks again
Have you by any change restricted access to the IMDS endpoint? https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-cluster-security?tabs=azure-cli#restrict-access-to-instance-metadata-api
Can you try to CURL on 169.254.169.254 from any pod.
Hey, We restrict access to the API server but not to the instance metadata API you referred to. But, the cluster's subnet is whitelisted to reach the API server.
We have a few applications that work with Workload identities on a different cluster, so I don't believe it's blocked.
curl 169.254.169.254
<?xml version="1.0" encoding="utf-8"?>
<Error xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Code>MissingRequiredQueryParameter</Code>
<Message>A required query parameter was not specified for this request.</Message>
<Details>'comp' is a required query string variable.</Details>
curl http://169.254.169.254/metadata/identity/oauth2/token
{"error":"invalid_request","error_description":"Required metadata header not specified"}
okay, thanks for this info. I started digging in a different direction https://github.com/vmware-tanzu/velero/blob/8afe3cea8b7058f7baaf447b9fb407312c40d2da/pkg/util/azure/credential.go#L49
So basically, from your logs I can see code is going to ManagedIdentityCredential
instead of kicking in for NewWorkloadIdentityCredential
Can you try to check if env has AZURE_FEDERATED_TOKEN_FILE injected ( for the pod I guess?)
My current hunch is that the workload identity for the velero pod is not setup correctly, it is not projecting the token into the service account and hence worklload identity auth is not kicking in.
I found the issue. It was silly of me to copy-paste the "az aks update" from the documentation without noticing that it does not activate the workload identity add-on.
https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer#update-an-aks-cluster-with-oidc-issuer
I think the Azure plugin documentation needs to be refreshed a bit. Might be able to assist with that a bit later.
Thanks for the help!
would it be possible for you to raise a PR for this small fix/ create an issue with the exact gaps you found? @idanme-tr
What steps did you take and what happened:
I am trying to deploy Velero Helm charts to AKS using Workload Identity. I've followed the Azure plugin guide with workload identity configurations.
For some reason, Velero cannot retrieve the storage account's properties. I've provided the managed identity with more permissions than needed to make sure I do not miss anything.
I understand that this issue might not be a bug but a misconfiguration, but I can't find what it is. When I am using Storage account key and not Workload identity it works fine.
What did you expect to happen: I expected Velero to be able to authenticate using the workload identity and to be able to backup and restore as it should.
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
bundle-2024-10-20-11-47-04.tar.gz
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
I am adding my Helm configurations. Lines that were commented out were different attempts but were also unsuccessful.
Environment:
Velero version (use
velero version
):Velero features (use
velero client config get features
):features <NOT SET>
Kubernetes version (use
kubectl version
):Kubernetes installer & version:
AKS 1.30.3
Cloud provider or hardware configuration:
Azure
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.