grafana / helm-charts

Apache License 2.0
1.62k stars 2.25k forks source link

Helm installation of grafana-k8s-monitoring grafana/k8s-monitoring fails on OpenShift due to existing CRDs #3103

Closed pbmoses closed 2 months ago

pbmoses commented 5 months ago

OpenShift ships with monitoring capabilities and has a monitoring cluster operator to include alertmanager, podmonitors, etc. . When deploying grafana-k8s-monitoring grafana/k8s-monitoring to OpenShift via helm with the chart provided in the cluster config UI, the installation fails due to existing CRDs. If the CRDS are removed from OpenShift, the operator will reconcile.

Hang tight while we grab the latest from your chart repositories... Update Complete. ⎈Happy Helming!⎈ Release "grafana-k8s-monitoring" does not exist. Installing it now. Error: Unable to continue with install: CustomResourceDefinition "alertmanagerconfigs.monitoring.coreos.com" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: key "app.kubernetes.io/managed-by" must equal "Helm": current value is "cluster-version-operator"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "grafana-k8s-monitoring"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "grafana-demo"

There is a short window of opportunity/workaround to force, if a user does the following, to pass the point where checks (assumed) are ran for existing CRDs. This is a blanket delete (not all conflict with the check) and not ideal but was used to prove out the workaround (OCP CO reconciled/recreated these )

pmo$ kubectl delete crd -l app.kubernetes.io/part-of=openshift-monitoring customresourcedefinition.apiextensions.k8s.io "alertingrules.monitoring.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "alertmanagerconfigs.monitoring.coreos.com" deleted customresourcedefinition.apiextensions.k8s.io "alertmanagers.monitoring.coreos.com" deleted customresourcedefinition.apiextensions.k8s.io "alertrelabelconfigs.monitoring.openshift.io" deleted customresourcedefinition.apiextensions.k8s.io "podmonitors.monitoring.coreos.com" deleted customresourcedefinition.apiextensions.k8s.io "probes.monitoring.coreos.com" deleted customresourcedefinition.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" deleted customresourcedefinition.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" deleted customresourcedefinition.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" deleted customresourcedefinition.apiextensions.k8s.io "thanosrulers.monitoring.coreos.com" deleted

Assumed logic would be to check if CRDs exist and if so pass/fail depending on scope or similar(?)

Command provided by Wizard in UI:


  helm repo update &&
  helm upgrade --install --atomic --timeout 300s grafana-k8s-monitoring grafana/k8s-monitoring \
    --namespace "default" --create-namespace --values - <<EOF
cluster:
  name: my-cluster
externalServices:
  prometheus:
    host: https://prometheus-prod-13-prod-us-east-0.grafana.net
    basicAuth:
      username: "REDACTED"
      password: REDACTED
  loki:
    host: https://logs-prod-006.grafana.net
    basicAuth:
      username: "REDACTED"
      password: REDACTED
  tempo:
    host: https://tempo-prod-04-prod-us-east-0.grafana.net:443
    basicAuth:
      username: "REDACTED"
      password: REDACTED
metrics:
  enabled: true
  cost:
    enabled: true
  node-exporter:
    enabled: true
logs:
  enabled: true
  pod_logs:
    enabled: true
  cluster_events:
    enabled: true
traces:
  enabled: true
receivers:
  grpc:
    enabled: true
  http:
    enabled: true
  zipkin:
    enabled: true
opencost:
  enabled: true
  opencost:
    exporter:
      defaultClusterId: my-cluster
    prometheus:
      external:
        url: https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom
kube-state-metrics:
  enabled: true
prometheus-node-exporter:
  enabled: true
prometheus-operator-crds:
  enabled: true
alloy: {}
alloy-logs: {}
EOF```
schndr commented 3 months ago

Hey @pbmoses,

The default for prometheus-operator-crds.enabled is true and deploys the Prometheus Operator CRDs to the cluster. Set this to false if your cluster already has the CRDs, or if you do not to have Grafana Alloy scrape metrics from PodMonitors, Probes, or ServiceMonitors. I hope this helps you as workaround.

petewall commented 2 months ago

yeah, @schndr has it correct. If you encounter this issue, you can set this in your values file:

prometheus-operator-crds:
  enabled: false
pbmoses commented 2 months ago

indeed, thank you. I had to et an OpenShift cluster back up test. Upon further review and testing this is present with enabled: false