prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
5.1k stars 5.02k forks source link

[kube-prometheus-stack] PVC CreateContainerConfigError #2124

Closed aneurinprice closed 2 years ago

aneurinprice commented 2 years ago

Describe the bug a clear and concise description of what the bug is.

Deploying kube-prometheus-stack with helm as follows:

---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: "prometheus-stack"
  namespace: "prometheus"
spec:
  chart:
    repository: https://prometheus-community.github.io/helm-charts
    version: "35.5.1"
    name: "kube-prometheus-stack"
  values:
    grafana:
      enabled: false
    alertmanager:
      enabled: true
    prometheus:
      prometheusSpec:
        storageSpec:
          volumeClaimTemplate:
            spec:
              storageClassName: "hostpath-fast"
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 50Gi
              selector:
                matchLabels:
                  app: prometheus

Pods get created but prometheus is reporing:

prometheus prometheus-prometheus-prometheus-stac-prometheus-0 1/2 CreateContainerConfigError

Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  FailedScheduling  4m59s                   default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         4m57s                   default-scheduler  Successfully assigned prometheus/prometheus-prometheus-prometheus-stac-prometheus-0 to breezy
  Normal   Pulled            4m56s                   kubelet            Container image "quay.io/prometheus-operator/prometheus-config-reloader:v0.56.3" already present on machine
  Normal   Created           4m56s                   kubelet            Created container init-config-reloader
  Normal   Started           4m55s                   kubelet            Started container init-config-reloader
  Normal   Pulled            4m55s                   kubelet            Container image "quay.io/prometheus-operator/prometheus-config-reloader:v0.56.3" already present on machine
  Normal   Created           4m55s                   kubelet            Created container config-reloader
  Normal   Started           4m55s                   kubelet            Started container config-reloader
  Warning  Failed            3m29s (x9 over 4m55s)   kubelet            Error: stat /fast/pvc-36674dcb-7048-47aa-af5a-27c67720d703: no such file or directory
  Normal   Pulled            3m14s (x10 over 4m55s)  kubelet            Container image "quay.io/prometheus/prometheus:v2.35.0" already present on machine

The Storage Provisioner is a hostpath provisioner but it is working fine with everything else but seems to not create the directory.

  Aneurins-MacBook-Air:prometheus aneurinprice$ kubectl get pvc -n prometheus
NAME                                                                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
prometheus-prometheus-prometheus-stac-prometheus-db-prometheus-prometheus-prometheus-stac-prometheus-0   Bound    pvc-36674dcb-7048-47aa-af5a-27c67720d703   50Gi       RWO            hostpath-fast   8m36s

moby@breezy:~$ sudo ls /fast | grep pvc-36674dcb-7048-47aa-af5a-27c67720d703

Tried several variations in the helm chart but I've not had any success. Running the chart without the PVC is fine.

Any help here would be much appreciated

What's your helm version?

version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"clean", GoVersion:"go1.16.5"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:08:39Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"darwin/arm64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

kube-prometheus-stack

What's the chart version?

35.5.1

What happened?

PVC Is created in Kubernetes

Underlaying Directory is not created (See description)

Prometheus pod stuck with CreateContainerConfigError

What you expected to happen?

Pod to start

How to reproduce it?

See description

Enter the changed values of values.yaml?

    grafana:
      enabled: false
    alertmanager:
      enabled: true
    prometheus:
      prometheusSpec:
        storageSpec:
          volumeClaimTemplate:
            spec:
              storageClassName: "hostpath-fast"
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 50Gi
              selector:
                matchLabels:
                  app: prometheus

Enter the command that you execute and failing/misfunctioning.

Every 2.0s: kubectl get pods -n prometheus                                                                                                              Aneurins-MacBook-Air.local: Mon Jun  6 18:12:15 2022

NAME                                                             READY   STATUS                       RESTARTS   AGE
alertmanager-prometheus-prometheus-stac-alertmanager-0           2/2     Running                      0          14m
prometheus-prometheus-prometheus-stac-prometheus-0               1/2     CreateContainerConfigError   0          14m
prometheus-prometheus-stac-operator-5cd9b5cc7b-jvt6r             1/1     Running                      0          14m
prometheus-prometheus-stack-kube-state-metrics-99ddd5445-b9rhz   1/1     Running                      0          14m
prometheus-prometheus-stack-prometheus-node-exporter-22s74       1/1     Running                      0          14m

Anything else we need to know?

No response

aneurinprice commented 2 years ago

Okay, so this seems to be related to the fact that I am running Kubelet in docker (RKE)

The subPath is the issue:

        - mountPath: /prometheus
          name: prometheus-prometheus-prometheus-kube-prometheus-db
          subPath: prometheus-db

Does anyone know a way to NOT have it use a subPath? I tried editing the statefulset but it gets overwritten instantly

rjhenry commented 2 years ago

To add to the above, it's a known issue with k8s itself: https://github.com/kubernetes/kubernetes/issues/61456

The Operator creates with SubPath, but there is an interesting field in the source: https://github.com/prometheus-operator/prometheus-operator/blob/e45574036f4282c519b64f458d296ca82f455a45/pkg/prometheus/statefulset.go#L1038

This implies to me that there's a field in the StorageSpec that should be able to disable the mount subpath - presumably

storageSpec:
  volumeClaimTemplate:
    spec:
      disableMountSubPath: false
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

teodorkostov-es commented 2 years ago

This is a known issue regarding Kubernetes and PersistentVolumes created with hostPath. Furthermore, since the custom resources are making the PersistentVolumeClaims, it seems that the place to fix this is in this chart. Also, the prometheus container in the Prometheus pod is running with UID:GID equal to 1000:2000. This is another limitation of the hostPath PersistentVolumes - the file permissions are fixed manually. Unfortunately, a chart user has to either dig through the Dockerfiles or set the mount point permissions to xx7 to see the actual values on the running container. Then go back and fix that and rerun everything...

To avoid such pitfalls, etcd uses an init container that could be enabled and fix the persistent storage file permissions.

Also, there seems to be a typo in the Prometheus resource template. The template does not expect a keyword storage as described in the values.yaml and the storage.md. It expects a keyword storageSpec. Check it here.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue is being automatically closed due to inactivity.

maxkokocom commented 1 year ago

Any update on solving this? As pointed by @teodorkostov-es points this issue should probably be fixed in this helm chart as etcd init container does.

Has anyone found any workaround like downgrading/upgrading versions of either chart or cluster?