grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.17k stars 537 forks source link

Bug: Mimir failing to parse env variable #9993

Closed grantjulian22 closed 6 days ago

grantjulian22 commented 6 days ago

What is the bug?

Getting the following error on several mimir pods after exposing status.podID through an environment variable MY_POD_IP: failure: service memberlist_kv failed: failed to create transport: could not parse bind addr \"${MY_POD_IP}\" as IP address"

How to reproduce it?

  1. Start GKE instance
  2. Use the following for setting memberlist.bind-addr for a pods configuration (e.g. distributor, compactor, ruler)
    extraArgs:
    memberlist.bind-addr: ${MY_POD_IP}
    env:
    - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP
  3. Start mimir with a helm install on the GKE instance
  4. The error should be viewable on a pods logs (e.g. ruler pod)

What did you think would happen?

I expected the mimir pods to be able to parse the environment variable $MY_POD_IP into a usable IP address

What was your environment?

Google Kubernetes Engine Helm v3.16.0-rc.1 Mimir-distributed-5.6.0

Any additional context to share?

Important to note that -config.expand-env=true

Full values file

global:
  extraEnv:
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP

mimir:
  structuredConfig:
    memberlist:
      abort_if_cluster_join_fails: false
      compression_enabled: false
      join_members:
        - dns+{{ include "mimir.fullname" . }}-gossip-ring.{{ .Release.Namespace }}.svc.{{ .Values.global.clusterDomain }}:{{ include "mimir.memberlistBindPort" . }}
      advertise_addr: "${MY_POD_IP}"

    common:
      storage:
        backend: gcs

    blocks_storage:
      backend: gcs
      gcs:
        bucket_name: mcp-23fv-blocks-internal-mimir-p

    alertmanager_storage:
      backend: gcs
      gcs:
        bucket_name: mcp-no43-alertmanager-internal-mimir-p

    ruler_storage:
      backend: gcs
      gcs:
        bucket_name: mcp-93oa-ruler-internal-mimir-p

alertmanager:
  persistentVolume:
    enabled: true
  replicas: 2
  resources:
    # limits:
    #   memory: 1.4Gi
    requests:
      cpu: 1
      # memory: 1Gi
  statefulSet:
    enabled: true
  extraArgs:
    memberlist.bind-addr: ${MY_POD_IP}
  env:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP

compactor:
  persistentVolume:
    size: 20Gi
  resources:
    # limits:
    #   memory: 2.1Gi
    requests:
      cpu: 1
      # memory: 1.5Gi
  extraArgs:
    memberlist.bind-addr: ${MY_POD_IP}
  env:
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP

distributor:
  replicas: 2
  resources:
    # limits:
    #   memory: 5.7Gi
    requests:
      cpu: 2
      # memory: 4Gi
  extraArgs:
    memberlist.bind-addr: ${MY_POD_IP}
  env:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP

ingester:
  persistentVolume:
    size: 50Gi
  replicas: 3
  resources:
    # limits:
    #   memory: 12Gi
    requests:
      cpu: 3.5
      # memory: 8Gi
  topologySpreadConstraints: {}
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: target # support for enterprise.legacyLabels
                operator: In
                values:
                  - ingester
          topologyKey: 'kubernetes.io/hostname'

        - labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/component
                operator: In
                values:
                  - ingester
          topologyKey: 'kubernetes.io/hostname'

  zoneAwareReplication:
    topologyKey: 'kubernetes.io/hostname'
  extraArgs:
    memberlist.bind-addr: ${MY_POD_IP}
  env:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP

admin-cache:
  enabled: true
  replicas: 2

chunks-cache:
  enabled: true
  replicas: 2

index-cache:
  enabled: true
  replicas: 3

metadata-cache:
  enabled: true

results-cache:
  enabled: true
  replicas: 2

minio:
  enabled: false

overrides_exporter:
  replicas: 1
  resources:
    # limits:
    #   memory: 128Mi
    requests:
      cpu: 100m
      # memory: 128Mi

querier:
  replicas: 1
  resources:
    # limits:
    #   memory: 5.6Gi
    requests:
      cpu: 2
      # memory: 4Gi
  extraArgs:
    memberlist.bind-addr: ${MY_POD_IP}
  env:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP

query_frontend:
  replicas: 1
  resources:
    # limits:
    #   memory: 2.8Gi
    requests:
      cpu: 2
      # memory: 2Gi

ruler:
  replicas: 1
  resources:
    #limits:
      #memory: 2.8Gi
    requests:
      cpu: 1
      #memory: 2Gi
  extraArgs:
    memberlist.bind-addr: ${MY_POD_IP}
  env:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP

store_gateway:
  persistentVolume:
    size: 10Gi
  replicas: 3
  resources:
    #limits:
      #memory: 2.1Gi
    requests:
      cpu: 1
      #memory: 1.5Gi
  topologySpreadConstraints: {}
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: target # support for enterprise.legacyLabels
                operator: In
                values:
                  - store-gateway
          topologyKey: 'kubernetes.io/hostname'

        - labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/component
                operator: In
                values:
                  - store-gateway
          topologyKey: 'kubernetes.io/hostname'
  zoneAwareReplication:
    topologyKey: 'kubernetes.io/hostname'
  extraArgs:
    memberlist.bind-addr: ${MY_POD_IP}
  env:
  - name: MY_POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP
nginx:
  replicas: 1
  resources:
    limits:
      memory: 731Mi
    requests:
      cpu: 1
      memory: 512Mi

# Grafana Enterprise Metrics feature related
admin_api:
  replicas: 1
  resources:
    limits:
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 64Mi

gateway:
  replicas: 1
  resources:
    limits:
      memory: 731Mi
    requests:
      cpu: 1
      memory: 512Mi
narqo commented 6 days ago
extraArgs:
   memberlist.bind-addr: ${MY_POD_IP}

When passing environment variables via command args you have to use parentheses, i.e. $(MY_POD_IP) (ref Kubernetes docs):

extraArgs:
-   memberlist.bind-addr: ${MY_POD_IP}
+   memberlist.bind-addr: $(MY_POD_IP)

Mimir first parses the provided config file, expands the envs, and then amends the config with the values from the CLI flags. The later values are used as is. They must be expanded by Kubernetes.

Please reopen the issue if this doesn't solve it for you.