[kube-prometheus-stack] PVs are not created

danielorkabi commented 1 year ago

Describe the bug a clear and concise description of what the bug is.

I am trying to install the Chart in my cluster, either pods and PVCs are stuck in the "Pending" state.

The same values were used in my other clusters as well with small changes such as resources, scrape configs, and labels.

k get pv

No resources found

k describe pvc XXX


  Warning  ProvisioningFailed    27m (x14 over 50m)    pd.csi.storage.gke.io_gke-9f84f9e97c194973a199-ae01-cafa-vm_13ad594c-d147-47b3-aab7-517572925b7f  failed to provision volume with StorageClass "standard": rpc error: code = InvalidArgument desc = CreateVolume failed to create single zonal disk pvc-bb9132ee-7ea8-4686-9270-9f71d2a09d2d: failed to insert zonal disk: unknown Insert disk error: googleapi: Error 400: Invalid value for field 'resource.labels': ''. Label value 'prometheus-monitoring-kube-prometheus-prometheus-db-prometheus-monitoring-kube-prometheus-prometheus-0' violates format constraints. The value can only contain lowercase letters, numeric characters, underscores and dashes. The value can be at most 63 characters long. International characters are allowed., invalid

k describe prometheus


  Storage:
    Volume Claim Template:
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         800Gi
        Storage Class Name:  standard

Values.yaml


      storageSpec:
        volumeClaimTemplate:
          spec:
            storageClassName: standard
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 800Gi

k get storageclass

standard                 kubernetes.io/gce-pd    Delete          Immediate              true                   19h

Thanks :)

What's your helm version?

v3.8.0

What's your kubectl version?

v1.26

Which chart?

kube-prometheus-stack

What's the chart version?

45.25.0

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install monitoring . -n monitoring -f values.yaml

Anything else we need to know?

I was trying to re-install the Chart and the results were the same.

danielorkabi commented 1 year ago

An update: the same process with the same values file work properly on v1.25 k8s clyster.

zeritti commented 1 year ago

It looks like you are hitting this issue in PD CSI driver release 1.9.0/1.9.1 where a PVC name is being used as a label on PD and thus limited to 63 chars but not being truncated. The related change was reverted in 1.9.2.

QuentinBisson commented 1 year ago

Thanks @zeritti @danielorkabi have you tried to upgrade your csi driver? Can we close this issue?

nicl-dev commented 1 year ago

We are stuck with our csi driver version as it's bound to our GKE cluster, which is running on 1.26.4-gke.500. Unfortunately a kubectl describe csidriver won't tell me the exact version but we ran into this problem after upgrading the cluster, so I'm sure it's the described issue. Does anyone know a workaround for this? fullnameOverride or nameOverride didn't work, the string was still too long.

zeritti commented 1 year ago

The fact that CRD field prometheus.prometheusSpec.storage.volumeClaimTemplate also supports metadata may help with shortening the PVC name a bit.

In terms of the values, you could then set something like

     storageSpec:
       volumeClaimTemplate:
         metadata:
           name: prometheus-pvc
         spec:
           storageClassName: standard
             accessModes: ["ReadWriteOnce"]
             resources:
               requests:
                 storage: 800Gi

The given name will just become a prefix, though, so the final name will be longer depending on the name of the prometheus CR appended to it. It may still fit in 63 chars (it did in my test).

darron commented 1 year ago

We ran into a similar problem on GKE - not with the length of the PV name - but with PVCs stuck in pending - and this helped us address the issue for us:

https://xebia.com/blog/do-this-before-you-upgrade-gke-to-k8s-1-25-or-you-might-feel-sorry/

We were on 1.26 as well.

Just FYI and in case it helps.

prometheus-community / helm-charts