grafana / grafana-operator

An operator for Grafana that installs and manages Grafana instances, Dashboards and Datasources through Kubernetes/OpenShift CRs
https://grafana.github.io/grafana-operator/
Apache License 2.0
863 stars 384 forks source link

[Bug]GrafanaDashboard resource created dashboards are not cleaned up when removed #1581

Open ak185158 opened 2 months ago

ak185158 commented 2 months ago

Describe the bug A clear and concise description of what the bug is.

When a GrafanaDashboard custom resource is used to create/manage a dashboard instance, it is expected that resulting dashboard instance created in Grafana would be cleaned up when the resource is removed. This does not appear to be the case and results in stale/orphaned dashboard instances that persist.

Version Full semver version of the operator being used e.g. v4.10.0, v5.0.0-rc0

v5.9.2

To Reproduce Steps to reproduce the behavior:

  1. Create a GrafanaDashboard custom resource
  2. Verify the corresponding dashboard instance is created in Grafana from the GrafanaDashboard resource
  3. Remove the GrafanaDashboard custom resource
  4. Verify the dashboard instance persists even though the originating custom resource that created it has been removed

Expected behavior Grafana-operator should remove the dashboard instance that was created by the custom resource once it is no longer present. Not doing so results in stale, orphaned dashboard instances once the underlying resource that created it is removed.

theSuess commented 2 months ago

Hey, I was unable to reproduce this issue. Maybe this has something to do with the permissions of your setup. How did you deploy the Grafana operator?

chaijunkin commented 2 months ago

I have similar issue when I deployed the grafana dashboard (operator managing) via argocd, not sure can I reproduce the step, but I will list them below Step 1 - original dashboard 2 - upgrade dashboard version (change original folder path name and remove original dashboard) 3 - the dashboard is not deleted

mkyc commented 2 months ago

exactly the same issue here, but it is inconsistent. During tests I approach it on random occasions.

Here are steps to reproduce (I'm copying from my k3d setup script):

setup

  1. install operator
    kubectl create namespace pmon-grafana-operator || true
    helm upgrade -i grafana-operator oci://ghcr.io/grafana/helm-charts/grafana-operator --version v5.9.2 --namespace pmon-grafana-operator --values grafana-operator.values.yaml --wait

grafana-operator.values.yaml:

serviceMonitor:
  enabled: true
  1. install Grafana
    kubectl create namespace pmon-grafana || true
    kubectl apply -f grafana.yaml --namespace pmon-grafana

grafana.yaml:

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana-var-lib-grafana-pv
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /tmp/var-lib-grafana
...
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-var-lib-grafana-pvc
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
...
---
apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
  name: grafana
  labels:
    dashboards: gitops
spec:
  deployment:
    spec:
      template:
        spec:
          containers:
            - name: grafana
              volumeMounts:
                - name: grafana-var-lib-grafana-pv
                  mountPath: /var/lib/grafana
          volumes:
            - name: grafana-var-lib-grafana-pv
              persistentVolumeClaim:
                claimName: grafana-var-lib-grafana-pvc
  service:
    spec:
      type: NodePort
    metadata:
      labels:
        app: grafana
  config:
    log:
      mode: "console"
    security:
      admin_user: root
      admin_password: secret
      disable_gravatar: "true"
    auth.anonymous:
      enabled: "false"
...
  1. install Grafana resources:
    kubectl create namespace pmon-grafana-resources || true
    kubectl apply -f grafana-resources.yaml --namespace pmon-grafana-resources

grafana-resources.yaml:

---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
  name: loki-datasource
spec:
  allowCrossNamespaceImport: true
  instanceSelector:
    matchLabels:
      dashboards: gitops
  datasource:
    name: loki
    type: loki
    uid: loki1
    access: proxy
    url: http://lgtm-loki-gateway.pmon-lgtm.svc.cluster.local
    isDefault: true
    jsonData:
      timeout: 60
      maxLines: 1000
...
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
  name: mimir-datasource
spec:
  allowCrossNamespaceImport: true
  instanceSelector:
    matchLabels:
      dashboards: gitops
  datasource:
    name: mimir
    uid: mimir1
    type: prometheus
    access: proxy
    url: http://lgtm-mimir-nginx.pmon-lgtm.svc.cluster.local/prometheus
    isDefault: false
...
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaFolder
metadata:
  name: test-folder
spec:
  allowCrossNamespaceImport: true
  instanceSelector:
    matchLabels:
      dashboards: gitops

  # If title is not defined, the value will be taken from metadata.name
  title: lalala/lilili
  # When permissions value is empty/absent, a folder is created with default permissions
  # When empty JSON is passed ("{}"), the access is stripped for everyone except for Admin (default Grafana behaviour)
  permissions: |
    {
      "items": [
        {
          "role": "Admin",
          "permission": 4
        },
        {
          "role": "Editor",
          "permission": 2
        }, 
        {
          "role": "Viewer",
          "permission": 1
        }
      ]
    }
...
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
  name: coredns-test-dashboard
spec:
  allowCrossNamespaceImport: true
  instanceSelector:
    matchLabels:
      dashboards: gitops
  grafanaCom:
    id: 15762
    revision: 18
...

result

as expected:

Screenshot 2024-07-05 at 13 23 55

("Logs/App" is added manually to test if operator doesn't interfere with those).

remove

Option 1:

kubectl delete -f grafana-resources.yaml --namespace pmon-grafana-resources || true

with grafana-resources.yaml form previous step:

Screenshot 2024-07-05 at 13 41 58

There are errors regarding loki datasource during reconciliation loop, but eventually those go away and are unrelated I guess.

Option 2:

kubectl delete --namespace pmon-grafana-resources GrafanaDashboard/coredns-test-dashboard 

not even single log message and:

Screenshot 2024-07-05 at 13 47 21

so nothing got removed, and it looks like operator didn't even noticed that resource was deleted.

But ... sometimes it works. If I run that same sequence of steps 3-5 times:

kubectl apply -f grafana-resources.yaml --namespace pmon-grafana-resources
kubectl delete --namespace pmon-grafana-resources GrafanaDashboard/coredns-test-dashboard 

eventually it will start removing that dashboard:

Screenshot 2024-07-05 at 13 55 49

and it will be adding and removing in next repeats.

It looks to me like that is operator not getting some events on removed dashboards sometimes. I didn't notice that for folders though, just for Dashboards.

pb82 commented 2 months ago

thanks @mkyc I'll try to reproduce from the provided steps now.

github-actions[bot] commented 4 weeks ago

This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label

Fantaztig commented 3 weeks ago

As I read the example by @mkyc the commands will delete both the dashboard and the containing folder at once, which leads to none of them being deleted in the instance. This behavior looks to be the same as described in #1626, right? @ak185158 do you experience the same issue when deleting only the dashboard?