prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
4.98k stars 4.99k forks source link

[kube-prometheus-stack] Can't initiate alertmanager volume, only prometheus, on Longhorn #3826

Open urbaman opened 11 months ago

urbaman commented 11 months ago

Hi,

I also fire the issue here, reporting the one fired on the longhorn repo.

I actually install kube-prometheus-stack as I always did, but probably the last chart version has something to do with the problem: using the very same settings for prometheus and alertmanager persistency, I get the prometheus volume, not the alertmanager one.

I always used the same settings, and they worked.

alertmanager:
  alertmanagerSpec:
    storage:
     volumeClaimTemplate:
       spec:
         storageClassName: longhorn
         accessModes: ["ReadWriteMany"]
         resources:
           requests:
             storage: 50Gi
prometheus:
  prometheusSpec:    
    storageSpec: 
     volumeClaimTemplate:
       spec:
         storageClassName: longhorn
         accessModes: ["ReadWriteMany"]
         resources:
           requests:
             storage: 50Gi

What's your helm version?

3.10.1

What's your kubectl version?

1.27.6

Which chart?

kube-prometheus-stack

What's the chart version?

51.2.0

What happened?

Just installed the chart with persistency enabled for prometheus and alertmanager backed by longhorn.

The Prometheus volume get created, the Alertmanager one does not

What you expected to happen?

I expect both the volumes to be created

How to reproduce it?

Install Longhorn 1.51.1, then kube-prometheus-stack with persistence enabled

Enter the changed values of values.yaml?

Uncommented both the persistence blocks, and changed the storageclass definition (gluster->longhorn)

Enter the command that you execute and failing/misfunctioning.

helm upgrade -i --namespace monitoring --create-namespace kube-prometheus-stack prometheus-community/kube-prometheus-stack --values kube-prometheus-stack-values.yaml

Anything else we need to know?

No response

urbaman commented 11 months ago

I can confirm that with the very same settings in che values (no Selector line), the two pvcs get different, as the alertmanager's one has a Selector {} spec not present in the prometheus one, and it's messing up the pv provisioning on longhorn.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
    volume.kubernetes.io/storage-provisioner: driver.longhorn.io
  creationTimestamp: "2023-09-26T10:05:41Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: kube-prometheus-stack-prometheus
    app.kubernetes.io/managed-by: prometheus-operator
    app.kubernetes.io/name: prometheus
    operator.prometheus.io/name: kube-prometheus-stack-prometheus
    operator.prometheus.io/shard: "0"
    prometheus: kube-prometheus-stack-prometheus
  name: prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0
  namespace: monitoring
  resourceVersion: "43582783"
  uid: 256a3e2d-6b32-4dd1-b846-37b7d1cc6bed
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
  storageClassName: longhorn
  volumeMode: Filesystem
  volumeName: pvc-256a3e2d-6b32-4dd1-b846-37b7d1cc6bed
status:
  accessModes:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
    volume.kubernetes.io/storage-provisioner: driver.longhorn.io
  creationTimestamp: "2023-09-25T10:47:46Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    alertmanager: kube-prometheus-stack-alertmanager
    app.kubernetes.io/instance: kube-prometheus-stack-alertmanager
    app.kubernetes.io/managed-by: prometheus-operator
    app.kubernetes.io/name: alertmanager
  name: alertmanager-kube-prometheus-stack-alertmanager-db-alertmanager-kube-prometheus-stack-alertmanager-0
  namespace: monitoring
  resourceVersion: "43063696"
  uid: c6fbaa39-0951-4626-9625-4d8c6eb0b4f5
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
  selector: {}
  storageClassName: longhorn
  volumeMode: Filesystem
status:
  phase: Pending
urbaman commented 11 months ago

Even more: the volumeClaimTemplates in the sts (prometheus, alertmanager) seem to be the same...

  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: prometheus-kube-prometheus-stack-prometheus-db
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 50Gi
      storageClassName: longhorn
      volumeMode: Filesystem
    status:
      phase: Pending
---
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: alertmanager-kube-prometheus-stack-alertmanager-db
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 50Gi
      storageClassName: longhorn
      volumeMode: Filesystem
    status:
      phase: Pending

I can't seem to get where it's not working...

urbaman commented 11 months ago

Last thing I can say, other than this seems to be a chart problem, not actually longhorn-related, is that I solved like this:

Everything worked out.

So it's only the very first pvc creation that gets messed up.

urbaman commented 11 months ago

Also: after the first instalaltion, pvc deletion and sts restart, if I uninstall the chart and install it again, it works... that's quite strange...

I did nothig in the meanwhile apart from deleting and restarting pvc and sts in the first installation...

TKinslayer commented 1 month ago

I'm having the same problem, but using the values.yaml to declare the persitentvolume I already created inside Longhorn.

I tried what I could think about, but nothing has been working yet. It does find the PV/PVC created for Grafana, but not alertmanager. And not prometheus either, though for prometheus, if I do a "loose" declaration (without specifying a volume claim or a selector), it creates a new one in Longhorn.

Were you confronted with that problem again later on?