timescale / tobs

tobs - The Observability Stack for Kubernetes. Easy install of a full observability stack into a k8s cluster with Helm charts.
Apache License 2.0
555 stars 60 forks source link

prometheus-stack-prometheus-0/1 pods are stuck in status "init:0/1" #655

Closed alienninja closed 1 year ago

alienninja commented 1 year ago

What happened? The prometheus-stack-prometheus-0/1 pods are stuck in status "init:0/1"

Did you expect to see something different? I expected the pods to complete initialization

How to reproduce it (as minimally and precisely as possible): Fresh helm installation of tobs with OpenEBS as storage backend

Environment Helm deployment on clean K8s cluster v1.24.6 with OpenEBS as the backend storage

Name:             prometheus-tobs-kube-prometheus-stack-prometheus-0
Namespace:        tobs
Priority:         0
Service Account:  tobs-kube-prometheus-stack-prometheus
Node:             knode19/20.20.20.219
Start Time:       Mon, 21 Nov 2022 19:10:02 +0000
Labels:           app.kubernetes.io/instance=tobs-kube-prometheus-stack-prometheus
                  app.kubernetes.io/managed-by=prometheus-operator
                  app.kubernetes.io/name=prometheus
                  app.kubernetes.io/version=2.40.1
                  controller-revision-hash=prometheus-tobs-kube-prometheus-stack-prometheus-54c6b6896f
                  operator.prometheus.io/name=tobs-kube-prometheus-stack-prometheus
                  operator.prometheus.io/shard=0
                  prometheus=tobs-kube-prometheus-stack-prometheus
                  statefulset.kubernetes.io/pod-name=prometheus-tobs-kube-prometheus-stack-prometheus-0
Annotations:      kubectl.kubernetes.io/default-container: prometheus
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/prometheus-tobs-kube-prometheus-stack-prometheus
Init Containers:
  init-config-reloader:
    Container ID:
    Image:         quay.io/prometheus-operator/prometheus-config-reloader:v0.60.1
    Image ID:
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /bin/prometheus-config-reloader
    Args:
      --watch-interval=0
      --listen-address=:8080
      --config-file=/etc/prometheus/config/prometheus.yaml.gz
      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
      --watched-dir=/etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:  prometheus-tobs-kube-prometheus-stack-prometheus-0 (v1:metadata.name)
      SHARD:     0
    Mounts:
      /etc/prometheus/config from config (rw)
      /etc/prometheus/config_out from config-out (rw)
      /etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 from prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mzmkp (ro)
Containers:
  prometheus:
    Container ID:
    Image:         quay.io/prometheus/prometheus:v2.40.1
    Image ID:
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --web.console.templates=/etc/prometheus/consoles
      --web.console.libraries=/etc/prometheus/console_libraries
      --storage.tsdb.retention.time=1d
      --config.file=/etc/prometheus/config_out/prometheus.env.yaml
      --storage.tsdb.path=/prometheus
      --web.enable-lifecycle
      --web.external-url=http://tobs-kube-prometheus-stack-prometheus.tobs:9090
      --web.route-prefix=/
      --storage.tsdb.wal-compression
      --web.config.file=/etc/prometheus/web_config/web-config.yaml
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        40m
      memory:     400Mi
    Liveness:     http-get http://:http-web/-/healthy delay=0s timeout=3s period=5s #success=1 #failure=6
    Readiness:    http-get http://:http-web/-/ready delay=0s timeout=3s period=5s #success=1 #failure=3
    Startup:      http-get http://:http-web/-/ready delay=0s timeout=3s period=15s #success=1 #failure=60
    Environment:  <none>
    Mounts:
      /etc/prometheus/certs from tls-assets (ro)
      /etc/prometheus/config_out from config-out (ro)
      /etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 from prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 (rw)
      /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
      /prometheus from prometheus-tobs-kube-prometheus-stack-prometheus-db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mzmkp (ro)
  config-reloader:
    Container ID:
    Image:         quay.io/prometheus-operator/prometheus-config-reloader:v0.60.1
    Image ID:
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /bin/prometheus-config-reloader
    Args:
      --listen-address=:8080
      --reload-url=http://127.0.0.1:9090/-/reload
      --config-file=/etc/prometheus/config/prometheus.yaml.gz
      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
      --watched-dir=/etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:  prometheus-tobs-kube-prometheus-stack-prometheus-0 (v1:metadata.name)
      SHARD:     0
    Mounts:
      /etc/prometheus/config from config (rw)
      /etc/prometheus/config_out from config-out (rw)
      /etc/prometheus/rules/prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 from prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mzmkp (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  prometheus-tobs-kube-prometheus-stack-prometheus-db:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-0
    ReadOnly:   false
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-tobs-kube-prometheus-stack-prometheus
    Optional:    false
  tls-assets:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          prometheus-tobs-kube-prometheus-stack-prometheus-tls-assets-0
    SecretOptionalName:  <nil>
  config-out:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0
    Optional:  false
  web-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-tobs-kube-prometheus-stack-prometheus-web-config
    Optional:    false
  kube-api-access-mzmkp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                   From               Message
  ----     ------       ----                  ----               -------
  Normal   Scheduled    5m46s                 default-scheduler  Successfully assigned tobs/prometheus-tobs-kube-prometheus-stack-prometheus-0 to knode19
  Warning  FailedMount  3m43s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[prometheus-tobs-kube-prometheus-stack-prometheus-db], unattached volumes=[config-out prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 kube-api-access-mzmkp tls-assets prometheus-tobs-kube-prometheus-stack-prometheus-db web-config config]: timed out waiting for the condition
  Warning  FailedMount  2m7s (x9 over 5m35s)  kubelet            MountVolume.MountDevice failed for volume "pvc-1b377915-0f50-4170-9d8e-6224dcd98ece" : rpc error: code = Internal desc = Waiting for pvc-1b377915-0f50-4170-9d8e-6224dcd98ece's CVC to be bound
  Warning  FailedMount  89s                   kubelet            Unable to attach or mount volumes: unmounted volumes=[prometheus-tobs-kube-prometheus-stack-prometheus-db], unattached volumes=[config config-out prometheus-tobs-kube-prometheus-stack-prometheus-rulefiles-0 kube-api-access-mzmkp tls-assets prometheus-tobs-kube-prometheus-stack-prometheus-db web-config]: timed out waiting for the condition

Anything else we need to know?:

The PVC referenced above, in question has been created and is bound:

Name:            pvc-1b377915-0f50-4170-9d8e-6224dcd98ece
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: cstor.csi.openebs.io
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    cstor-csi-ssd-disk
Status:          Bound
Claim:           tobs/prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-0
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        8Gi
Node Affinity:   <none>
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            cstor.csi.openebs.io
    FSType:            ext4
    VolumeHandle:      pvc-1b377915-0f50-4170-9d8e-6224dcd98ece
    ReadOnly:          false
    VolumeAttributes:      openebs.io/cas-type=cstor
                           storage.kubernetes.io/csiProvisionerIdentity=1668714202752-8081-cstor.csi.openebs.io
Events:                <none>

Could the issue be related to it looking for a volume

prometheus-tobs-kube-prometheus-stack-prometheus-db

but the actual volume is

prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-0
alienninja commented 1 year ago

I found the issue, the label is to long for prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-1, this is being reported in the openebs-cstor-cvc-operator logs:

1122 02:53:44.533412       1 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeConfig", Namespace:"openebs", Name:"pvc-27948946-bab3-4c45-acda-8cf7e8dc90e9", UID:"b56a2149-369c-4cef-8430-01f23ebc14e8", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"2390459", FieldPath:""}): type: 'Warning' reason: 'Provisioning' CStorVolume.cstor.openebs.io "pvc-27948946-bab3-4c45-acda-8cf7e8dc90e9" is invalid: metadata.labels: Invalid value: "prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-1": must be no more than 63 characters
I1122 02:53:44.535808       1 controller.go:304] creating cstorvolume resource
E1122 02:53:44.543448       1 controller_base.go:321] error syncing 'openebs/pvc-c7d48022-8eb7-4909-9abe-eef03db97025': CStorVolume.cstor.openebs.io "pvc-c7d48022-8eb7-4909-9abe-eef03db97025" is invalid: metadata.labels: Invalid value: "prometheus-tobs-kube-prometheus-stack-prometheus-db-prometheus-tobs-kube-prometheus-stack-prometheus-0": must be no more than 63 characters, requeuing
I
alienninja commented 1 year ago

I was able to solve my issue with help from issue https://github.com/timescale/tobs/issues/563. Adding this line to my helm install command fixed the issue:

--set kube-prometheus-stack.prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.metadata.name=data