[prometheus-kube-stack] Grafana is not persistent

ofiryy commented 3 years ago

Describe the bug I installed the prometheus-community/kube-prometheus-stack chart. and then I defined panels and alerts on grafana. when I delete the grafana pod - all the data is deleted from grafana - there is no persistency. I wanted to use this solution: https://github.com/prometheus-operator/prometheus-operator/issues/2558#issuecomment-565119967 but to my surprise - no pv or pvc was created by the prometheus-kube-stack chart.

how can I make my Grafana persistent ?

Version of Helm and Kubernetes:

Helm Version:

$ helm version version.BuildInfo{Version:"v3.0.3", GitCommit:"ac925eb7279f4a6955df663a0128044a8a6b7593", GitTreeState:"clean", GoVersion:"go1.13.6"}

Kubernetes Version:

$ kubectl version Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-eks-2ba888", GitCommit:"2ba888155c7f8093a1bc06e3336333fbdb27b3da", GitTreeState:"clean", BuildDate:"2020-07-17T18:48:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Which chart: prometheus-kube-stack

Which version of the chart: 12.3.0

How to reproduce it (as minimally and precisely as possible): Install prometheus-kube-stack and define a panel in grafana, then delete the grafana pod

survivant commented 3 years ago

There is a setting for prometheus and alertmanager

storage:
        volumeClaimTemplate:
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: sc-mirror
            resources:
              requests:
                storage: 300Mi

I think we should have the same for grafana ?

totoroot commented 3 years ago

This is not a bug as data persistence is not enabled by default. You can either claim a PersistentVolume in your custom values.yaml file like @survivant suggested or export your dashboards as JSON definition files and create a ConfigMap with the JSON-formatted data for each custom dashboard. This way with each new release of the stack via helm, the modifications within Grafana do not persist but your exported dashboards get redeployed with everything else.

blademainer commented 3 years ago

@ofiryy I update the yaml: values.yaml add:

grafana:
  persistence:
    enabled: true

to fix the grafana persistent problem.

survivant commented 3 years ago

@blademainer but we still can't choose our storage class

BertelBB commented 3 years ago

@survivant prometheus-community/kube-prometheus-stack chart uses the grafana/grafana chart as a dependency. So any values you can pass to grafana/grafana you can pass to the grafana object in this chart. Or am I misunderstanding the issue being raised?

This works for me

grafana:
  enabled: true

  persistence:
    enabled: true
    type: pvc
    storageClassName: default
    accessModes:
    - ReadWriteOnce
    size: 4Gi
    finalizers:
    - kubernetes.io/pvc-protection

survivant commented 3 years ago

@BertelBB thank you. I don't know what I did wrong at that time.. but it works fine. Now, I need to find a workaround for https://github.com/prometheus-community/helm-charts/issues/437

mkoziel2000 commented 3 years ago

I'm not sure the workflow of expecting all the grafana settings to get zapped on the next stopping of a pod has got the best interests of the enterprise in mind. I get the argument of exporting the charts as JSON and storing into configMaps to make them deployment agnostic, but there are other settings not related to charts that we don't want to have disappear when a pod crashes either (such as user login information, settings around alerting, and so forth). So, unless there is a best practice for storing all of that into configmaps as well (and a good user UI for how to do that, which doesn't require kubectl and a Kubernetes admin), it seems shortsighted to think that Grafana can live in an enterprise environment as an application that doesn't require persistence. It seems the opposite would be true.

I too am wringing out the kinks of my Prometheus install and ran into this exact same problem of grafana not supporting persistence out of the box. It was rather alarming to learn that after I began building out dashboards, I lost that work when I tested out the failover scenarios of the pod going down. I did not see a persistence piece in the grafana part of the values.yaml and didn't know that this would turn grafana into an app with a temporary persistence layer.

In hind sight, I should have done my pod failover test first before beginning to "persist" data in grafana to learn about this annoying default. I do wish that the helm chart can be upgraded to have a section under grafana that allows the ability to define the persistence layer...even if its commented out.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue is being automatically closed due to inactivity.

AndrewGrachov commented 3 years ago

For anyone who's looking - kube-prometheus stack uses values from

grafana chart

Probably this should be included in docs..

darox commented 3 years ago

@survivant prometheus-community/kube-prometheus-stack chart uses the grafana/grafana chart as a dependency. So any values you can pass to grafana/grafana you can pass to the grafana object in this chart. Or am I misunderstanding the issue being raised?

This works for me
grafana:
  enabled: true

  persistence:
    enabled: true
    type: pvc
    storageClassName: default
    accessModes:
    - ReadWriteOnce
    size: 4Gi
    finalizers:
    - kubernetes.io/pvc-protection

I have used your code snippet, but I'm facing issue: Warning FailedScheduling 6s (x5 over 83s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.

k get pods -n prometheus                                        
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          19h
prometheus-grafana-5d9946dff9-4ffgc                      0/2     Pending   0          2m10s
prometheus-grafana-669fbc79f9-dmmhk                      2/2     Running   0          3m58s
prometheus-kube-prometheus-operator-85ccf48856-q8n68     1/1     Running   0          12m
prometheus-kube-state-metrics-6dc7f98565-twkxk           1/1     Running   0          12m
prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   1          19h
prometheus-prometheus-node-exporter-bgqtb                1/1     Running   0          19h

I wonder how I can fix it.

BertelBB commented 3 years ago

@darox The issues is that your PVC is already bound to the pod prometheus-grafana-669fbc79f9-dmmhk, so the new grafana pod cannot claim the PV and therefore fails to start.

A quick fix would be to delete the ReplicaSet for the older grafana pod, i.e. kubectl delete rs prometheus-grafana-669fbc79f9 -n prometheus.

A permanent fix would be to make sure that two grafana pods cannot be running at the same time. So your rolling update strategy should ensure that when a grafana upgrade is in progress, the scheduler first kills the old pod before starting the new one. I'm no expert in update strategies, but I think this should work

EDIT: Previous strategy was wrong, this one works.

grafana:
  deploymentStrategy:
    type: Recreate

This strategy will ensure the old grafana pod is terminated before starting a new one, which will result in a short downtime for Grafana during upgrades.

darox commented 3 years ago

I have applied your recommendations:

k get pods -n prometheus                                                                       
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          18s
prometheus-grafana-6fb7f46b9c-5ph99                      0/2     Pending   0          22s
prometheus-kube-prometheus-operator-548f79bb9-hskjx      1/1     Running   0          22s
prometheus-kube-state-metrics-5b8f9bdbbd-tr8vq           1/1     Running   0          22s
prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   1          18s
prometheus-prometheus-node-exporter-k9nzm                1/1     Running   0          22s

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  24s (x6 over 111s)  default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.

BertelBB commented 3 years ago

@darox is the prometheus-grafana (default name) PVC marked as Bound and if so what pod is it being used by?

kubectl get pvc -n prometheus
kubectl describe pvc -n prometheus prometheus-grafana (replace name if needed)

Do you in fact have a default StorageClass?

kubectl get sc

darox commented 3 years ago

It worked with:

grafana:
  deploymentStrategy:
    type: Recreate
  persistence:
    enabled: true
    type: pvc
    storageClassName: hostpath
    accessModes:
    - ReadWriteOnce
    size: 4Gi
    finalizers:
    - kubernetes.io/pvc-protection

k get pvc -n prometheus                                                 
NAME                                                                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
prometheus-grafana                                                                                       Bound    pvc-d7ec8849-db92-4ec7-a465-f7ff67e414cb   4Gi        RWO            hostpath       41s
prometheus-prometheus-kube-prometheus-prometheus-db-prometheus-prometheus-kube-prometheus-prometheus-0   Bound    pvc-4009c793-9d44-4a7b-ab4b-13af00c513ad   5Gi        RWO            hostpath       4d2h

Thanks a lot for your support :)

kamilgregorczyk commented 3 years ago

For some reason it doesn't work for me, got such values:

## helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --values values.yml

kube-state-metrics:
  image:
    repository: k8s.gcr.io/kube-state-metrics-arm64
    tag: v1.9.5
prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false

grafana:
  adminPassword: xxx
  deploymentStrategy:
    type: Recreate
  enabled: true
  persistance:
    enabled: true
    type: pvc
    storageClassName: default
    accessModes:
      - ReadWriteOnce
    size: 4Gi
    finalizers:
      - kubernetes.io/pvc-protection
  grafana.ini:
    server:
      domain: xxx
      root_url: xxx
    auth.google:
      enabled: true
      client_id: xxx
      client_secret: xxx
      scopes: https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
      auth_url: https://accounts.google.com/o/oauth2/auth
      token_url: https://accounts.google.com/o/oauth2/token
      allowed_domains: gmail.com
      allow_sign_up: false
    paths:
      data: /var/lib/grafana/data
      logs: /var/log/grafana
      plugins: /var/lib/grafana/plugins
      provisioning: /etc/grafana/provisioning
    analytics:
      check_for_updates: true
    log:
      mode: console
    grafana_net:
      url: https://grafana.net

and after doing an upgrade no PVCs are created, I also tried just this for Grafana and still no luck

grafana:
  adminPassword: xxx
  enabled: true
  persistance:
    enabled: true

➜  prometheus git:(master) ✗ kubectl get pvc
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-postgres-postgresql-0   Bound    pvc-e099d418-73d4-49ed-8232-e829e418c6b4   8Gi        RWO            nfs-client     453d
docker-registry              Bound    pvc-0e726761-fdfd-454d-86a6-36002c37ac3b   30Gi       RWO            nfs-client     149d
streaming-pvc-streaming-0    Bound    pvc-cb97df2f-7e99-47c8-80e0-c215381ee672   20Gi       RWO            nfs-client     135d
streaming-pvc-streaming-1    Bound    pvc-2240cdb5-7ab4-4aee-99c5-45696e4100bb   20Gi       RWO            nfs-client     135d
streaming-pvc-streaming-2    Bound    pvc-7f65bb1f-97ec-4890-9b29-6ba36f470cfe   20Gi       RWO            nfs-client     135d

AwateAkshay commented 3 years ago

can anyone help me with dashboard location ? I have added above values.yaml for persistence, the volume is bound. But when i restart pod, dashboards wont come up ? kamilgregorczyk

UrosCvijan commented 2 years ago

Hi @AwateAkshay ,

did you solve your problem? I am having the same issue. I can see my dashboards when I get into grafana container, but they are not present in the grafana itself.

AwateAkshay commented 2 years ago

@UrosCvijan exec into grafana pod, you will see grafana.db file which is SQLite DB. Inside you can see your dashboard.

sakulh commented 2 years ago

@kamilgregorczyk not "persistance" but "persistence":

grafana:
  adminPassword: xxx
  enabled: true
  persistence:
    enabled: true

asher-lab commented 2 years ago

For some reason it doesn't work for me, got such values:

## helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --values values.yml

kube-state-metrics:
  image:
    repository: k8s.gcr.io/kube-state-metrics-arm64
    tag: v1.9.5
prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false

grafana:
  adminPassword: xxx
  deploymentStrategy:
    type: Recreate
  enabled: true
  persistance:
    enabled: true
    type: pvc
    storageClassName: default
    accessModes:
      - ReadWriteOnce
    size: 4Gi
    finalizers:
      - kubernetes.io/pvc-protection
  grafana.ini:
    server:
      domain: xxx
      root_url: xxx
    auth.google:
      enabled: true
      client_id: xxx
      client_secret: xxx
      scopes: https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
      auth_url: https://accounts.google.com/o/oauth2/auth
      token_url: https://accounts.google.com/o/oauth2/token
      allowed_domains: gmail.com
      allow_sign_up: false
    paths:
      data: /var/lib/grafana/data
      logs: /var/log/grafana
      plugins: /var/lib/grafana/plugins
      provisioning: /etc/grafana/provisioning
    analytics:
      check_for_updates: true
    log:
      mode: console
    grafana_net:
      url: https://grafana.net

and after doing an upgrade no PVCs are created, I also tried just this for Grafana and still no luck

grafana:
  adminPassword: xxx
  enabled: true
  persistance:
    enabled: true

➜  prometheus git:(master) ✗ kubectl get pvc
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-postgres-postgresql-0   Bound    pvc-e099d418-73d4-49ed-8232-e829e418c6b4   8Gi        RWO            nfs-client     453d
docker-registry              Bound    pvc-0e726761-fdfd-454d-86a6-36002c37ac3b   30Gi       RWO            nfs-client     149d
streaming-pvc-streaming-0    Bound    pvc-cb97df2f-7e99-47c8-80e0-c215381ee672   20Gi       RWO            nfs-client     135d
streaming-pvc-streaming-1    Bound    pvc-2240cdb5-7ab4-4aee-99c5-45696e4100bb   20Gi       RWO            nfs-client     135d
streaming-pvc-streaming-2    Bound    pvc-7f65bb1f-97ec-4890-9b29-6ba36f470cfe   20Gi       RWO            nfs-client     135d

Thanks for posting your code, it helped me debug how to add environmental variables in kube prometheus stack. Now I know that syntax.

aksh-sood commented 1 year ago

I tried the above methods and it got the pv created . But the pod failed to start as an initContainer chownData fails to start even after multiple tries. I followed the following issue 752 and set the initChownData to false. Now the grafana pod starting running and i am able to access the dashboard but the logs of grafana pod show error="database is locked"

prometheus-community / helm-charts

[prometheus-kube-stack] Grafana is not persistent #436