kubernetes-sigs / blob-csi-driver

Azure Blob Storage CSI driver
Apache License 2.0
120 stars 80 forks source link

Use workload identity instead of storage key? Identity not found? #945

Closed TeamDman closed 2 weeks ago

TeamDman commented 1 year ago

Is your feature request related to a problem?/Why is this needed

I have a cluster where I have a k8s service account set up with Workload Identity.

https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview

The identity has Storage Blob Data Reader role, so it should be able to read the storage account?

However, it doesn't seem like this is a supported use case, instead I think we are required to create a k8s secret with a storage key?

https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md

Describe the solution you'd like in detail

I'd like this to work without needing to create a secret for the storage key, since I should be able to give the workload identity principle the necessary roles to access the storage

apiVersion: apps/v1
kind: Deployment
metadata:
  name: staticsite-dep
spec:
  revisionHistoryLimit: 3
  replicas: 1
  selector:
    matchLabels:
      app: staticsite-lbl
  template:
    metadata:
      labels:
        app: staticsite-lbl
    spec:
      nodeSelector:
        kubernetes.io/os: linux
      containers:
        - name: staticsite-cont
          image: nginxinc/nginx-unprivileged:1.24-bullseye-perl
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 250m
              memory: 256Mi
          ports:
            - containerPort: 8080
          securityContext: # https://kubernetes.io/docs/concepts/security/pod-security-standards/
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1001
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
                - NET_RAW
            seLinuxOptions:
              type: container_t
            seccompProfile:
              type: RuntimeDefault
          volumeMounts:
            - mountPath: /usr/share/nginx/html
              name: persistent-storage
            - mountPath: /tmp
              name: temp-storage
            # - name: nginx-conf
            #   mountPath: /etc/nginx/nginx.conf
            #   subPath: nginx.conf
            #   readOnly: true
      volumes:
        # https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md
        - name: temp-storage
          emptyDir: {}
        - name: persistent-storage
          csi:
            driver: blob.csi.azure.com
            volumeAttributes:
              containerName: webcontent
              # https://learn.microsoft.com/en-us/azure/aks/azure-csi-blob-storage-provision?tabs=mount-nfs%2Csecret#static-provisioning-parameters
              mountOptions: "-o allow_other --file-cache-timeout-in-seconds=120"
              resourceGroup: my-RGP
              storageAccount: staticsitedemo
              AzureStorageAuthType: msi
              AzureStorageIdentityClientId: 00000-000000000-0000000000-00000 # managed identity matches service account
              # https://github.com/kubernetes-sigs/blob-csi-driver/issues/618
              # https://github.com/Azure/azure-storage-fuse#environment-variables
              # secretName: staticsite-storagekey
        # - name: nginx-conf
        #   configMap:
        #     name: nginx-conf
        #     items:
        #       - key: nginx.conf
        #         path: nginx.conf
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: workload-identity-staticsite
  labels:
    azure.workload.identity/use: "true"
  annotations:
    # from terraform output
    azure.workload.identity/client-id: 00000000-0000000-0000000-0000000
    azure.workload.identity/tenant-id: 111111111-111111111-11111111-1111111
automountServiceAccountToken: false

Describe alternatives you've considered

Additional context

The above example errors with the following

MountVolume.SetUp failed for volume "persistent-storage" : rpc error: code = Internal desc = Mount failed with error: exit status 255, output: OAUTH Token : Refresh token failed Failed to retrieve OAuth Token from IMDS endpoint (CURLCode: 0, HTTP code: 400): {"error":"invalid_request","error_description":"Identity not found"}Unable to retrieve OAuth token: Failed to retrieve OAuth Token from IMDS endpoint (CURLCode: 0, HTTP code: 400): {"error":"invalid_request","error_description":"Identity not found"} Unable to start blobfuse due to authentication or connectivity issues. Please check the readme for valid auth setups. no config filedone reading env varsURI token request URL printed out http://redacted/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=redacted&resource=https://storage.azure.com/

I tried adding the object id in case that helped, but it seems that I was right the first time and only the client ID should need to be specified

MountVolume.SetUp failed for volume "persistent-storage" : rpc error: code = Internal desc = Mount failed with error: exit status 255, output: OAUTH Token : Refresh token failed Failed to retrieve OAuth Token from IMDS endpoint (CURLCode: 0, HTTP code: 400): {"error":"invalid_request","error_description":"Only one of 'client_id', 'object_id', 'principal_id', or 'mi_res_id' may be provided"}Unable to retrieve OAuth token: Failed to retrieve OAuth Token from IMDS endpoint (CURLCode: 0, HTTP code: 400): {"error":"invalid_request","error_description":"Only one of 'client_id', 'object_id', 'principal_id', or 'mi_res_id' may be provided"} Unable to start blobfuse due to authentication or connectivity issues. Please check the readme for valid auth setups. no config filedone reading env varsURI token request URL printed out http://redac/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=redac&object_id=redac&resource=https://storage.azure.com/
andyzhangx commented 1 year ago

what blob csi driver version are you using? is it managed by AKS? we are not releasing the workload identity support release yet.

TeamDman commented 1 year ago

I believe it's the AKS managed one, will follow up Tuesday when I'm back at the console

https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster#blob_driver_enabled

andyzhangx commented 1 year ago

@TeamDman the managed blob csi driver does not support workload identity, it supports managed identity instead, check details here: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/workload-identity.md

in your case, you could remove AzureStorageAuthType, and AzureStorageIdentityClientId parameters in volumeAttributes, and then grant the managed identity for the kubelet on the agent node read & write access to the storage account, it should work.

you could also set AzureStorageAuthType: msi and set a correct AzureStorageIdentityClientId value, then the blobfuse mount would use the managed identity you assigned to mount directly.

TeamDman commented 1 year ago

I hesitate to grant the kubelet identity the perms since that grants the whole cluster access. I'm setting up a shared environment where multiple teams will be using a cluster separated by namespaces; the goal is to be able to use managed identities so I'll check out the second part you linked!

Node pools Kubernetes versions 1.23.12

Node sizes Standard_D2s_v3

Cluster Configuration Kubernetes version 1.24.9

TeamDman commented 1 year ago

Here's what I'm using

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

helmCharts:
- name: blob-csi-driver
  repo: https://raw.githubusercontent.com/kubernetes-sigs/blob-csi-driver/master/charts
  version: v1.18.0
  namespace: kube-system
  releaseName: blob-csi-driver
  valuesInline:
    controller:
      replicas: 1
    node.enableBlobfuseProxy: true
    cloud: AzureStackCloud

Will follow up once I get the time to try the msi auth type.

andyzhangx commented 1 year ago

@TeamDman msi is managed identity, and it can only be assigned to node level. if you want namespace isolation identity, account key stored as k8s secret is the only way now.

TeamDman commented 1 year ago

Thanks for the clarification, will wait for GA release of the full support then <3

andyzhangx commented 9 months ago

/assign @cvvz

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 weeks ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/blob-csi-driver/issues/945#issuecomment-2307173946): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.