[grafana] tolerations and affinity not working.

DaazKu commented 1 year ago

I defined the following in my values.yml

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: eks.amazonaws.com/nodegroup
              operator: In
              values:
                - system-core

tolerations:
  - key: dedicated
    operator: Equal
    value: system-workload

I would expect to find those on the pod definition but it's not present.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: grafana
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2023-06-13T18:07:05Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: grafana
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: grafana
    app.kubernetes.io/version: 9.5.3
    helm.sh/chart: grafana-6.57.2
  name: grafana
  namespace: monitoring
  resourceVersion: "1786392"
  uid: 8f7f626d-5a86-4cee-b3f6-b48b4ae7b750
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: grafana
      app.kubernetes.io/name: grafana
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        checksum/config: 0e7ec4dbff37d0a260afd4799f38de7357da538d250f40962afc40a88770fe94
        checksum/dashboards-json-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
        checksum/sc-dashboard-provider-config: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
        checksum/secret: ea9a723c4b793f39cc1ed29a00ce2e193f6c61f9a2cf3ff04627c4a097c645e5
        kubectl.kubernetes.io/default-container: grafana
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: grafana
        app.kubernetes.io/name: grafana
    spec:
      automountServiceAccountToken: true
      containers:
      - env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: GF_SECURITY_ADMIN_USER
          valueFrom:
            secretKeyRef:
              key: admin-user
              name: grafana
        - name: GF_SECURITY_ADMIN_PASSWORD
          valueFrom:
            secretKeyRef:
              key: admin-password
              name: grafana
        - name: GF_PATHS_DATA
          value: /var/lib/grafana/
        - name: GF_PATHS_LOGS
          value: /var/log/grafana
        - name: GF_PATHS_PLUGINS
          value: /var/lib/grafana/plugins
        - name: GF_PATHS_PROVISIONING
          value: /etc/grafana/provisioning
        image: docker.io/grafana/grafana:9.5.3
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        name: grafana
        ports:
        - containerPort: 3000
          name: grafana
          protocol: TCP
        - containerPort: 9094
          name: gossip-tcp
          protocol: TCP
        - containerPort: 9094
          name: gossip-udp
          protocol: UDP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          seccompProfile:
            type: RuntimeDefault
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/grafana/grafana.ini
          name: config
          subPath: grafana.ini
        - mountPath: /var/lib/grafana
          name: storage
        - mountPath: /etc/grafana/provisioning/datasources/apiVersion
          name: config
          subPath: apiVersion
        - mountPath: /etc/grafana/provisioning/datasources/datasources
          name: config
          subPath: datasources
        - mountPath: /etc/grafana/provisioning/datasources/datasources.yaml
          name: config
          subPath: datasources.yaml
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      initContainers:
      - command:
        - chown
        - -R
        - 472:472
        - /var/lib/grafana
        image: docker.io/library/busybox:1.31.1
        imagePullPolicy: IfNotPresent
        name: init-chown-data
        resources: {}
        securityContext:
          capabilities:
            add:
            - CHOWN
          runAsNonRoot: false
          runAsUser: 0
          seccompProfile:
            type: RuntimeDefault
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/grafana
          name: storage
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 472
        runAsGroup: 472
        runAsNonRoot: true
        runAsUser: 472
      serviceAccount: grafana
      serviceAccountName: grafana
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: grafana
        name: config
      - name: storage
        persistentVolumeClaim:
          claimName: grafana
status:
  conditions:
  - lastTransitionTime: "2023-06-13T18:07:06Z"
    lastUpdateTime: "2023-06-13T18:07:06Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2023-06-13T18:07:06Z"
    lastUpdateTime: "2023-06-13T18:07:06Z"
    message: ReplicaSet "grafana-6bb7dbbb5c" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 1
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

mprimeaux commented 4 months ago

I am also encountering this issue with the gateway deployment when using version 1.9.3 of the tempo-distributed Helm chart.

I've tried removing the gateway affinity TPL but that doesn't work. Here's that stanza.

gateway:
    enabled: true
    affinity: {}

The resulting warnings emitted are:

coalesce.go:289: warning: destination for tempo-distributed.gateway.affinity is a table. Ignoring non-table value (podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          {{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 10 }}
      topologyKey: kubernetes.io/hostname
  preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchLabels:
            {{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 12 }}
        topologyKey: topology.kubernetes.io/zone
)
coalesce.go:289: warning: destination for tempo-distributed.gateway.affinity is a table. Ignoring non-table value (podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          {{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 10 }}
      topologyKey: kubernetes.io/hostname
  preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchLabels:
            {{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 12 }}
        topologyKey: topology.kubernetes.io/zone

I also modified the default affinity stanza to use preferredDuringSchedulingIgnoreDuringExecution rather than the default value of requiredDuringSchedulingIgnoreDuringExecution and that didn't work, either. Here's that stanza.

  affinity: |
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              {{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 10 }}
          topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                {{- include "tempo.selectorLabels" (dict "ctx" . "component" "gateway") | nindent 12 }}
            topologyKey: topology.kubernetes.io/zone

Here is the deployment error:

status:
  conditions:
  - lastTransitionTime: "2024-04-23T15:34:02Z"
    lastUpdateTime: "2024-04-23T15:34:02Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2024-04-23T15:34:02Z"
    lastUpdateTime: "2024-04-23T15:34:02Z"
    message: ReplicaSet "tempo-gateway-7789b77696" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing

Here is the pod-level error:

Name:             tempo-gateway-69cf4bd4dd-ntnf9
Namespace:        observability
Priority:         0
Service Account:  tempo
Node:             <none>
Labels:           app.kubernetes.io/component=gateway
                  app.kubernetes.io/instance=tempo
                  app.kubernetes.io/name=tempo
                  pod-template-hash=69cf4bd4dd
Annotations:      checksum/config: a465b5249192bb4606db6d4a260200bd3847b8b173f434841b597e87b4953b6b
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/tempo-gateway-69cf4bd4dd
Containers:
  nginx:
    Image:        docker.io/nginxinc/nginx-unprivileged:1.19-alpine
    Port:         8080/TCP
    Host Port:    0/TCP
    Readiness:    http-get http://:http-metrics/ delay=15s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /docker-entrypoint.d from docker-entrypoint-d-override (rw)
      /etc/nginx from config (rw)
      /tmp from tmp (rw)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      tempo-gateway
    Optional:  false
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  docker-entrypoint-d-override:
    Type:                     EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:                   
    SizeLimit:                <unset>
QoS Class:                    BestEffort
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=gateway,app.kubernetes.io/instance=tempo,app.kubernetes.io/name=tempo
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  4s (x3 over 25s)  default-scheduler  0/1 nodes are available: 1 node(s) didn't satisfy existing pods anti-affinity rules. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

I see deployment-gateway.yaml does use affinity.

As another datapoint, this error occurs in Kubernetes 1.29.41 and 1.30.0, which are the latest supported stable versions at the time of this comment. I'm using minikube version 1.32.0.

Your help is greatly appreciated.

UPDATE: The issue with the coalesce.go:289: warning: destination for tempo-distributed.gateway.affinity is a table. Ignoring non-table value error is because the gateway deployment template assumes podAntiAffinity will always be defined. So, the workaround is to use the following stanza for it:

gateway:
  # -- Affinity for gateway pods.
  # @default -- Hard node anti-affinity
  affinity: |-
    podAntiAffinity: {}

mprimeaux commented 4 months ago

Turns out the issue for my deployment was with the anti-affinity stanza of the Loki Gateway pod conflicting with Tempo Gateway pod. I ended up changing the Loki stanza for the gateway pod to the following:

gateway:
  # -- Affinity for gateway pods.
  # @default -- Hard node anti-affinity
  affinity: |-
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/component: gateway
          topologyKey: kubernetes.io/hostname

grafana / helm-charts

[grafana] tolerations and affinity not working. #2460