grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
3.88k stars 504 forks source link

Support EKS Pod Identity feature #3899

Open Rohlik opened 1 month ago

Rohlik commented 1 month ago

Is your feature request related to a problem? Please describe. A very common solution for granting permission to S3 buckets is via IAM roles for Service Accounts (IRSA), but recently (2023), AWS introduced EKS Pod Identity functionality, which simplifies granting AWS services access to pods running in an EKS cluster. However, Tempo (and other Grafana components) seems to be incompatible based on the docs and my tests:

err="failed to init module services: error initialising module: store: failed to create store: unexpected error from ListObjects on dev-tempo: Access Denied"

Describe the solution you'd like Support this modern way of granting access to AWS services to pods via EKS Pod Identity.

Describe alternatives you've considered The mentioned alternative solution with IRSA works fine. However, it can be unnecessarily complicated, especially in big deployments.

Additional context The primary prerequisite is aws-sdk-go with version > v1.47.11, which Tempo fulfills. We use tempo-distributed Helm chart. Related pod's output of compactor, which shows that the container has proper ENVs/mounts auto-set, but the container itself doesn't use them for some reason:

spec:
  containers:
  - args:
    - -target=compactor
    - -config.file=/conf/tempo.yaml
    - -mem-ballast-size-mbs=1024
    env:
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: eu-central-1
    - name: AWS_REGION
      value: eu-central-1
    - name: AWS_CONTAINER_CREDENTIALS_FULL_URI
      value: http://169.254.170.23/v1/credentials
    - name: AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE
      value: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token
    image: docker.io/grafana/tempo:2.5.0
    imagePullPolicy: IfNotPresent
    name: compactor
    ports:
    - containerPort: 3100
      name: http-metrics
      protocol: TCP
    - containerPort: 7946
      name: http-memberlist
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 2000Mi
      requests:
        cpu: 5m
        memory: 300Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsGroup: 1000
      runAsNonRoot: true
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /conf
      name: config
    - mountPath: /runtime-config
      name: runtime-config
    - mountPath: /var/tempo
      name: tempo-compactor-store
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9s9fj
      readOnly: true
    - mountPath: /var/run/secrets/pods.eks.amazonaws.com/serviceaccount
      name: eks-pod-identity-token
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: false
  nodeName: ip-10-2-6-59.eu-central-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000
  serviceAccount: tempo-pi
  serviceAccountName: tempo-pi
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: eks-pod-identity-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: pods.eks.amazonaws.com
          expirationSeconds: 86400
          path: eks-pod-identity-token
  - configMap:
      defaultMode: 420
      items:
      - key: tempo.yaml
        path: tempo.yaml
      name: tempo-config
    name: config
  - configMap:
      defaultMode: 420
      items:
      - key: overrides.yaml
        path: overrides.yaml
      name: tempo-runtime
    name: runtime-config
  - emptyDir: {}
    name: tempo-compactor-store
  - name: kube-api-access-9s9fj
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
joe-elliott commented 1 month ago

We actually use the minio s3 client. Here is our Tempo s3 config:

https://github.com/grafana/tempo/blob/9951e7c4a9278f57410eb458694f1c52c0182d78/tempodb/backend/s3/config.go#L13-L39

And here is where we use it to build a minio client:

https://github.com/grafana/tempo/blob/9951e7c4a9278f57410eb458694f1c52c0182d78/tempodb/backend/s3/s3.go#L615

This appears relevant to our interests:

https://github.com/minio/minio-go/issues/1940

Looks like this was released here:

https://github.com/minio/minio-go/releases/tag/v7.0.70

We updated to this version here:

https://github.com/grafana/tempo/pull/3721

So with a little luck this will be supported in 2.6.0?

AnhQKatalon commented 1 month ago

We are having the same issue. Pod Identity was configured correctly and the containers had auto-mounted ENVs properly.

image

But somehow, Tempo's services do not pick those credentials. The only way seems to work now is using IRSA

level=error ts=2024-07-24T04:34:13.563927913Z caller=main.go:121 msg="error running Tempo" err="failed to init module services: error initialising module: store: failed to create store: unexpected error from ListObjects on s3-tempo: Access Denied"

Additional information, Grafana Loki and Mimir can work normally with EKS Pod Identity

Rohlik commented 1 month ago

@AnhQKatalon šŸ§ I was not able to make it work even with Mimir, I'm getting similar error as for Tempo:

err="blocks storage: unable to successfully send a request to object storage: Access Denied"

@joe-elliott Thank yout for that claryfication about Go library šŸ˜‡.

mogopz commented 1 month ago

@Rohlik I can confirm Mimir works with Pod Identity.

We're running most Grafana OSS services and Tempo + Pyroscope are the only two that don't work with Pod Identity at the moment.