grafana / tempo-operator

Grafana Tempo Kubernetes operator
https://grafana.com/docs/tempo/latest/setup/operator/
GNU Affero General Public License v3.0
61 stars 30 forks source link

Add support for AWS STS #978

Closed pavolloffay closed 3 months ago

pavolloffay commented 3 months ago

Resolves #553

https://issues.redhat.com/browse/TRACING-4227

Test instructions:

kubectl create namespace tempo-operator-system
IMG_PREFIX=docker.io/pavolloffay OPERATOR_VERSION=$(date +%s).0.0 BUNDLE_VARIANT=openshift make docker-build docker-push bundle bundle-build bundle-push olm-deploy

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: aws-sts
stringData:
  bucket: tempoploffay
  region: us-east-2
  role_arn: arn:aws:iam::TBD:role/ploffay-simplest
type: Opaque
EOF

kubectl apply -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: simplest
spec:
  storage:
    secret:
      name: aws-sts
      type: s3
  storageSize: 1Gi
  resources:
    total:
      limits:
        memory: 2Gi
        cpu: 2000m
  template:
    queryFrontend:
      jaegerQuery:
        enabled: true
        ingress:
          type: route
EOF

kubectl apply -f - <<EOF
apiVersion: tempo.grafana.com/v1alpha1
kind: TempoMonolithic
metadata:
  name: simplest
spec:
  storage:
    traces:
      backend: s3
      s3:
        secret: aws-sts
EOF
codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 73.39450% with 29 lines in your changes missing coverage. Please review.

Project coverage is 73.23%. Comparing base (75de22c) to head (3ac2873). Report is 1 commits behind head on main.

Files Patch % Lines
internal/manifests/monolithic/configmap.go 38.88% 7 Missing and 4 partials :warning:
internal/handlers/storage/secret.go 84.61% 5 Missing and 1 partial :warning:
cmd/generate/main.go 0.00% 4 Missing :warning:
internal/manifests/manifestutils/annotations.go 0.00% 4 Missing :warning:
internal/handlers/storage/storage.go 0.00% 2 Missing :warning:
internal/manifests/config/build.go 0.00% 1 Missing :warning:
internal/manifests/manifests.go 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #978 +/- ## ======================================= Coverage 73.22% 73.23% ======================================= Files 105 105 Lines 6503 6568 +65 ======================================= + Hits 4762 4810 +48 - Misses 1450 1465 +15 - Partials 291 293 +2 ``` | [Flag](https://app.codecov.io/gh/grafana/tempo-operator/pull/978/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=grafana) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/grafana/tempo-operator/pull/978/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=grafana) | `73.23% <73.39%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=grafana#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

IshwarKanse commented 3 months ago

@pavolloffay With the route enabled in TempoStack query frontend, we create a SA for the query frontend along with the default SA for the stack. The query frontend SA doesn't have the required annotations for STS which causes the frontend pod to fail.

apiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
  name: tmstack
spec:
  storage:
    secret:
      name: aws-sts
      type: s3
  storageSize: 20Gi
  resources:
    total:
      limits:
        memory: 4Gi
        cpu: 2000m
  template:
    queryFrontend:
      jaegerQuery:
        enabled: true
        ingress:
          type: route

oc get sa
NAME                           SECRETS   AGE
builder                        1         38m
default                        1         38m
deployer                       1         38m
tempo-tmstack                  1         29m
tempo-tmstack-query-frontend   1         29m

% oc get sa tempo-tmstack-query-frontend -o yaml
apiVersion: v1
imagePullSecrets:
- name: tempo-tmstack-query-frontend-dockercfg-7j4cm
kind: ServiceAccount
metadata:
  annotations:
    serviceaccounts.openshift.io/oauth-redirectreference.primary: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"tempo-tmstack-query-frontend"}}'
  creationTimestamp: "2024-07-12T04:39:44Z"
  labels:
    app.kubernetes.io/component: query-frontend
    app.kubernetes.io/instance: tmstack
    app.kubernetes.io/managed-by: tempo-operator
    app.kubernetes.io/name: tempo
  name: tempo-tmstack-query-frontend
  namespace: test-tempostack
  ownerReferences:
  - apiVersion: tempo.grafana.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: TempoStack
    name: tmstack
    uid: e98d32a4-b942-4e66-b624-db57574bc78f
  resourceVersion: "128365"
  uid: 95134fd8-0f67-41ac-baa0-88475240973c
secrets:
- name: tempo-tmstack-query-frontend-dockercfg-7j4cm

 % oc get pods
NAME                                           READY   STATUS             RESTARTS       AGE
tempo-tmstack-compactor-67776478c9-h7rs5       1/1     Running            0              32m
tempo-tmstack-distributor-5f8d5f9655-x8qph     1/1     Running            0              32m
tempo-tmstack-ingester-0                       1/1     Running            0              32m
tempo-tmstack-querier-859d965f78-d2p44         1/1     Running            0              32m
tempo-tmstack-query-frontend-599769fdd-xlpwz   2/3     CrashLoopBackOff   11 (69s ago)   32m
pavolloffay commented 3 months ago

The issue is that the query-frontend uses a different service account when the route for the Jaeger UI is enabled.

tempo-tmstack                  1         6m25s
tempo-tmstack-query-frontend   1         6m24s

@IshwarKanse could we require associating role with two SAs? Lokistack does it https://loki-operator.dev/docs/short_lived_tokens_authentication.md/#aws-secure-token-service

# Create a trust relationship file
cat > "$trust_rel_file" <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${aws_account_id}:oidc-provider/${oidc_provider}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${oidc_provider}:sub": [
            "system:serviceaccount:${tempostack_ns}:tempo-${tempostack_name}",
           "system:serviceaccount:${tempostack_ns}:tempo-${tempostack_name}-query-frontend"
         ]
       }
     }
   }
 ]
}
EOF
IshwarKanse commented 3 months ago

@pavolloffay Tested with the fix commit. The stack is running now. I have also added the AWS IAM policy script to our git repo. https://github.com/openshift/distributed-tracing-qe/blob/main/scripts/aws-sts-s3-access.sh