grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
22.75k stars 3.31k forks source link

Azure storage with federated workload identity #12614

Open leinad87 opened 2 months ago

leinad87 commented 2 months ago

Is your feature request related to a problem? Please describe. To configure Azure storage, there are some options like managed identities or service principal. I don't like managed identities becase they can be used by any pod in the cluster. If you think about service principal, SAS or blob storage token, they have sensitive data that should be managed carefully and secrets are not easily configured: https://github.com/grafana/loki/issues/9143

Describe the solution you'd like Azure has introduce Federated Identity Credentials. This will allow to sync a Service Account with a Azure application with required permisions without passwords.

Could be possible to implement Azure Federated Workload Identity to authenticate Loki against Azure Storage Account?

rowanmoul commented 2 months ago

Unless I am misinterpreting this PR, I think this is a documentation issue, not a feature support issue. It looks like Workload Identity is already implemented. I will be trying to get it working myself in the coming days. https://github.com/grafana/loki/pull/11802

Edit: Nevermind, I see that PR was specificly for the Loki Operator, not Loki itself. I'm not even sure where the relevant code is for this. I hope it's not this file, because that is using an old azure go library that has been effectively deprecated for more than a year now, so updating it will require more effort.

rowanmoul commented 1 month ago

Good news! Workload identity is actually supported already as of PR #7540 that I somehow missed before, it's just a poorly named option. The azure storage config value that you want is called useFederatedToken in the loki helm chart (use_federated_token if setting in loki's config.yaml) I have tested today and loki is able to successfully authenticate.

valimail-scott commented 1 month ago

I had to add this to the helm values to get it to work:

        loki:
          podLabels:
            azure.workload.identity/use: 'true'
rowanmoul commented 1 month ago

yes, I assumed that was a given when using workload identity. There are a number of other things that need to be setup, like a "federated credential" azure resource, and enabling some settings on your azure kubernetes cluster to allow it to issue OIDC tokens. You'll also need a label and annotation on the service account:

serviceAccount:
  annotations:
    azure.workload.identity/client-id: 'your-client-id-here'
  labels:
    azure.workload.identity/use: 'true'
valimail-scott commented 1 month ago

To be fair, not everything requires the podLabels. external-dns does, but external-secrets-operator works with only the serviceAccount annotation as an example.

Richard87 commented 1 month ago

Tried this tonight, but getting a panic from ADAL(!?)

The workload identity token is added to the pod with the usual AZURE_ env variables...

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1bb19b9]

goroutine 1 [running]:
github.com/Azure/go-autorest/autorest/adal.(*ServicePrincipalToken).SetCustomRefreshFunc(...)
    /src/loki/vendor/github.com/Azure/go-autorest/autorest/adal/token.go:418
github.com/grafana/loki/v3/pkg/storage/chunk/client/azure.(*BlobStorage).getServicePrincipalToken(0xc001706e00, {0x2b87c90?, 0x2b87c98?})
    /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:441 +0x279
github.com/grafana/loki/v3/pkg/storage/chunk/client/azure.(*BlobStorage).getOAuthToken(0xc001706e00)
    /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:393 +0xe6
github.com/grafana/loki/v3/pkg/storage/chunk/client/azure.(*BlobStorage).newPipeline(0xc001706e00, {0xee6b280, 0x3, 0x14}, 0x0)
    /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:377 +0x265
github.com/grafana/loki/v3/pkg/storage/chunk/client/azure.NewBlobStorage(0xc000676a10, {0xc000814040?, {0x325bfe8?, 0xc000bd6420?}}, {0xc000c75080?, 0xc001798698?, 0x4105e5?})
    /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:201 +0x111
github.com/grafana/loki/v3/pkg/ruler/base.NewLegacyRuleStore({{0xc00082c560, 0x5}, {{0x2a7e063, 0xb}, {0xc000c99c50, 0x11}, {{0x0, 0x0}}, {0x0, 0x0}, ...}, ...}, ...)
    /src/loki/pkg/ruler/base/storage.go:99 +0x325
github.com/grafana/loki/v3/pkg/loki.(*Loki).initRulerStorage(0xc001708000)
    /src/loki/pkg/loki/modules.go:1153 +0x2fb
github.com/grafana/dskit/modules.(*Manager).initModule(0xc000aa43a8, {0x7ffd9c975324, 0x3}, 0x1?, 0xc0017000c0?)
    /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc00066e0c0, 0x1, 0xc000cc8180?})
    /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc001708000, {0x0?, {0x4?, 0x3?, 0x4912940?}})
    /src/loki/pkg/loki/loki.go:453 +0x9d
main.main()
    /src/loki/cmd/loki/main.go:122 +0x113b
Helm Values

```yaml monitoring: serviceMonitor: enable: true loki: podLabels: azure.workload.identity/use: 'true' commonConfig: path_prefix: /var/loki replication_factor: 3 compactor_address: '{{ include "loki.compactorAddress" . }}' schemaConfig: configs: - from: "2024-04-01" store: tsdb index: prefix: loki_index_ period: 24h object_store: azure schema: v13 storage: type: azure bucketNames: chunks: loki-chunks # Needs to pre-exist ruler: loki-rulers # Needs to pre-exist admin: loki-admin # Needs to pre-exist azure: accountName: redacted userAssignedId: "redacted" endpointSuffix: blob.core.windows.net useFederatedToken: true singleBinary: replicas: 0 extraArgs: - "-config.expand-env=true" deploymentMode: SingleBinary ```

CaspervdKerk commented 1 month ago

@Richard87 Indeed, I came to this issue board to raise an issue on this when I saw this thread already open. When using the useManagedIdentity: true flag an error appears from adal:

adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=<redacted-clientId>&resource=https%3A%2F%2F<redacted-storage-account-name>.blob.core.windows.net

This is quite concerning as the adal library has been deprecated for over a year now.

rowanmoul commented 1 month ago

This issue is about workload identity, not managed identity. While workload identity makes use of managed identities under the hood, it is a different authentication method.

I'm not a Loki maintainer, but I suspect it would be better to create a new issue either about managed identity authentication, or more generally about the deprecated ADAL library, which might get more attention. When Loki is updated to the Azure.Identity sub-package of the Azure SDK for Go it can make use of the DefaultAzureCredential which will auto-detect which authentication method you are using from a number of different options, including both Managed Identity and Workload Identity.

rowanmoul commented 1 month ago

These are the relevant helm values I am using successfully for Workload Identity on Loki Helm Chart 6.5.2

loki:
  podLabels:
    "azure.workload.identity/use": "true"
  storage:
    type: "azure"
    azure:
      accountName: "redacted"
      useFederatedToken: true
    bucketNames:
      chunks: "loki-chunk-store"
      ruler: "loki-ruler-store"
serviceAccount:
  annotations:
    "azure.workload.identity/client-id": "redacted"
  labels:
    "azure.workload.identity/use": "true"
Richard87 commented 1 month ago

Thanks, I added azure.workload.identity/client-id annotation manually to the service account, and it stopped panicking!

(And when creating the bucket names in the correct storage account, everything worked great! 🙏 )