prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
5.12k stars 5.02k forks source link

[prometheus] Pod has unbound immediate PersistentVolumeClaims #4946

Closed ember11498 closed 3 weeks ago

ember11498 commented 3 weeks ago

Describe the bug a clear and concise description of what the bug is.

When I install prometheus with:

helm upgrade --install prometheus prometheus-community/prometheus --namespace monitoring

prometheus erver and alertmanager pvc's are unbound.

if i try to do like:

helm upgrade --install prometheus prometheus-community/prometheus --namespace monitoring --set server.persistentVolume.storageClass=prometheus-block-storage --set alertmanager.persistentVolume.storageClass=prometheus-block-storage

it only fixes the prometheus server, but the alertmanager pvc still unbund.

What's your helm version?

version.BuildInfo{Version:"v3.16.2", GitCommit:"13654a52f7c70a143b1dd51416d633e1071faffb", GitTreeState:"clean", GoVersion:"go1.22.7"}

What's your kubectl version?

1.27.2

Which chart?

prometheus

What's the chart version?

latest i guess

What happened?

when i run this commands:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install prometheus prometheus-community/prometheus --namespace monitoring

i get the following output:

kubectl -n monitoring get pods

NAME                                                READY   STATUS    RESTARTS   AGE
prometheus-alertmanager-0                           0/1     Pending   0          2m31s
prometheus-kube-state-metrics-7b97cb57c6-48g4g      1/1     Running   0          2m31s
prometheus-prometheus-node-exporter-8bxsz           1/1     Running   0          2m31s
prometheus-prometheus-pushgateway-9f8c968d6-wp5kz   1/1     Running   0          2m31s
prometheus-server-7d64c54f54-47qcm                  0/2     Pending   0          2m31s

kubectl -n monitoring describe pod prometheus-alertmanager-0

Name:             prometheus-alertmanager-0
Namespace:        monitoring
Priority:         0
Service Account:  prometheus-alertmanager
Node:             <none>
Labels:           app.kubernetes.io/instance=prometheus
                  app.kubernetes.io/name=alertmanager
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=prometheus-alertmanager-6d6657797d
                  statefulset.kubernetes.io/pod-name=prometheus-alertmanager-0
Annotations:      checksum/config: 0365605996785840eb3acce3e448a2439eed065b13bc5a73cd1878a7fffc5ff3
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/prometheus-alertmanager
Containers:
  alertmanager:
    Image:      quay.io/prometheus/alertmanager:v0.27.0
    Port:       9093/TCP
    Host Port:  0/TCP
    Args:
      --storage.path=/alertmanager
      --config.file=/etc/alertmanager/alertmanager.yml
    Liveness:   http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_IP:   (v1:status.podIP)
    Mounts:
      /alertmanager from storage (rw)
      /etc/alertmanager from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gp4bf (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storage-prometheus-alertmanager-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-alertmanager
    Optional:  false
  kube-api-access-gp4bf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age    From                Message
  ----     ------             ----   ----                -------
  Warning  FailedScheduling   3m16s  default-scheduler   0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Normal   NotTriggerScaleUp  3m14s  cluster-autoscaler  pod didn't trigger scale-up:

kubectl -n monitoring describe pod prometheus-server-7d64c54f54-47qcm

Name:             prometheus-server-7d64c54f54-47qcm
Namespace:        monitoring
Priority:         0
Service Account:  prometheus-server
Node:             <none>
Labels:           app.kubernetes.io/component=server
                  app.kubernetes.io/instance=prometheus
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=prometheus
                  app.kubernetes.io/part-of=prometheus
                  app.kubernetes.io/version=v2.55.0
                  helm.sh/chart=prometheus-25.28.0
                  pod-template-hash=7d64c54f54
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/prometheus-server-7d64c54f54
Containers:
  prometheus-server-configmap-reload:
    Image:      quay.io/prometheus-operator/prometheus-config-reloader:v0.77.2
    Port:       8080/TCP
    Host Port:  0/TCP
    Args:
      --watched-dir=/etc/config
      --listen-address=0.0.0.0:8080
      --reload-url=http://127.0.0.1:9090/-/reload
    Liveness:     http-get http://:metrics/healthz delay=2s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:metrics/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/config from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pdrw5 (ro)
  prometheus-server:
    Image:      quay.io/prometheus/prometheus:v2.55.0
    Port:       9090/TCP
    Host Port:  0/TCP
    Args:
      --storage.tsdb.retention.time=15d
      --config.file=/etc/config/prometheus.yml
      --storage.tsdb.path=/data
      --web.console.libraries=/etc/prometheus/console_libraries
      --web.console.templates=/etc/prometheus/consoles
      --web.enable-lifecycle
    Liveness:     http-get http://:9090/-/healthy delay=30s timeout=10s period=15s #success=1 #failure=3
    Readiness:    http-get http://:9090/-/ready delay=30s timeout=4s period=5s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /data from storage-volume (rw)
      /etc/config from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pdrw5 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-server
    Optional:  false
  storage-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  prometheus-server
    ReadOnly:   false
  kube-api-access-pdrw5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                    From                Message
  ----     ------             ----                   ----                -------
  Warning  FailedScheduling   4m26s (x2 over 4m28s)  default-scheduler   0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Normal   NotTriggerScaleUp  4m26s                  cluster-autoscaler  pod didn't trigger scale-up:

kubectl -n monitoring get pvc

NAME                                STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
prometheus-server                   Pending                                                     <unset>                 5m16s
storage-prometheus-alertmanager-0   Pending                                                     <unset>                 5m14s

kubectl -n monitoring describe pvc prometheus-server

Name:          prometheus-server
Namespace:     monitoring
StorageClass:  
Status:        Pending
Volume:        
Labels:        app.kubernetes.io/component=server
               app.kubernetes.io/instance=prometheus
               app.kubernetes.io/managed-by=Helm
               app.kubernetes.io/name=prometheus
               app.kubernetes.io/part-of=prometheus
               app.kubernetes.io/version=v2.55.0
               helm.sh/chart=prometheus-25.28.0
Annotations:   meta.helm.sh/release-name: prometheus
               meta.helm.sh/release-namespace: monitoring
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       prometheus-server-7d64c54f54-47qcm
Events:
  Type    Reason         Age                  From                         Message
  ----    ------         ----                 ----                         -------
  Normal  FailedBinding  72s (x162 over 41m)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

What you expected to happen?

I just want to install prometheus and have all its respective pods running well

How to reproduce it?

n.a.

Enter the changed values of values.yaml?

helm upgrade --install prometheus prometheus-community/prometheus --namespace monitoring

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install prometheus prometheus-community/prometheus --namespace monitoring

Anything else we need to know?

n.a.

zeritti commented 3 weeks ago

When I install prometheus with:

helm upgrade --install prometheus prometheus-community/prometheus --namespace monitoring prometheus

server and alertmanager pvc's are unbound.

Without having a default storage class and not setting a specific storage class at the chart installation, the PVCs will remain pending since the desired volumes cannot be provided.

if i try to do like:

helm upgrade --install prometheus prometheus-community/prometheus --namespace monitoring --set server.persistentVolume.storageClass=prometheus-block-storage --set alertmanager.persistentVolume.storageClass=prometheus-block-storage

it only fixes the prometheus server, but the alertmanager pvc still unbund.

This is because the path to the storage class field is different in each chart:

helm upgrade --install prometheus prometheus-community/prometheus \
  --namespace monitoring \
  --set server.persistentVolume.storageClass=prometheus-block-storage \
  --set alertmanager.persistence.storageClass=prometheus-block-storage
ember11498 commented 3 weeks ago

@zeritti solved!