CluserRole name templating breaks project monitoring when more than one is installed

rgcosma commented 3 months ago

Cluster Setup K8s 1.27, Rancher 2.8.3, RKE2

Describe the bug The ClusterRole and corresponding ClusterRoleBinding in https://github.com/rancher/prometheus-federator/blob/main/charts/rancher-project-monitoring/0.4.2/templates/rancher-monitoring/hardened.yaml#L47 use the same name template - {{ .Chart.Name }}-patch-sa (same goes for 0.4.1 and 0.4.0 didn't check others) This breaks project, monitoring after the first one is installed because the helm installer pods errors out with "project-monitoring-patch-sa has incorrect ownership annotation" Changing the annotations or deleting the role doesn't work, because the helm controller immediately recreates them. The workaround we applied was to change the above to {{ .Release.Name }} and rebuild the binary because the chart is bundled as a base64 string.

To Reproduce Deploy the first project on a cluster - monitoring works, Prometheus and Grafana running Deploy one more project

Result Second and all subsequent project monitors get stuck with a confusing "WaitingForGrafanaDashboards" status in the web interface

Expected Result All project monitors have status "Deployed"

rgcosma commented 2 months ago

FYI I ended up editing the subchart and rebuilding the binary, a tedious and convoluted process - why are you bundling a chart as a base64 string?

mallardduck commented 2 months ago

@rgcosma - Can you clarify here, you're not using the prometheus-federator project or chart directly right? How are you installing them in your cluster, via the Rancher UI or something else? And what chart version and name are you specifically installing when you encounter this issue?

edit: Ultimately what I'm getting at here is to understand where and why you're getting version 0.4.x in Rancher 2.8.x. As from my understanding that version is not valid for 2.8 and the highest should be 0.3.x on Rancher 2.8.

rgcosma commented 2 months ago

@rgcosma - Can you clarify here, you're not using the prometheus-federator project or chart directly right? How are you installing them in your cluster, via the Rancher UI or something else? And what chart version and name are you specifically installing when you encounter this issue?

Hi! I am installing the prometheus-federator chart directly, tried via Helm and Argo same result. Chart version is 103.0.2+up0.4.0 downloaded from https://github.com/rancher/charts/tree/release-v2.8/charts/prometheus-federator and name is prometheus-federator

rancher / prometheus-federator

CluserRole name templating breaks project monitoring when more than one is installed #92