Open pankajsqe opened 3 days ago
So this issue goes back further than my changes do. I've been able to replicate the issue on the following setup:
Rancher: v2.10-a1a0d2edf04548b0a9099f86bcf8c194771db8f8-head
k8s: k3s - v1.27.7+k3s2
Storage: longhorn:104.1.0+up1.6.2
1
Monitoring: rancher-monitoring:105.1.0-rc.4+up61.3.2
Federator: prometheus-federator:104.0.1+up0.4.2
(prom-fed embededed helm-controller disabled)
Workload: a random pihole chart (doesn't matter but used a real one for extra realism)
And I see:
Update: After chatting in slack threads I tested with the prom-fed
embeded helm-controller
enabled and it worked almost instantly. Of note, before I enabled the prom-fed helm-controller
I didn't have pods for installing things at all. Further chats revealed that there may not be a need to disable the embedded on on k3s, but you do on rke2 versions. I will deploy a downstream rke2 to test with instead of the k3s one I had on hand.
Update 2: Testing on RKE2 with helm-controller
enabled yield similar results to k3s. If I disabled it like the docs might imply, then it fails to create pods to install things. Sounds like Julia and Pankaj left this value on for all their tests - so I'll do the same from now on. Final test will be using my fork.
This seems to be a potential incompatibility between rancher-monitoring
and the rancher-project-monitoring
chart that gets deployed. In my fork/branch it is deploying from 0.4.2
chart version and the last working version of the chart (in my testing) is the 0.3.x
versions of the chart.
I’ve found that starting in 0.4.0 the chart started to include the namespace as part of the charts templates file. And by adding .grafana.defaultDashboards.useExistingNamespace
and setting it to true
this resolves the currently reported bug. However there are additional errors I'm seeing still that I'm suspecting may be related to 0.4.x
versus 0.3.x
chart differences in general.
Additionally, to fix image pull issues in 0.4.2
chart it's necessary to add: .global.imageRegistry
to ensure it pulls all images correctly. So a complete workaround would be to add:
global:
imageRegistry: docker.io
grafana:
defaultDashboards:
useExistingNamespace: true
into the ProjectMonitor
when you create it. Making sure to merge grafana values with the existing key for it.
To enable easier debugging I've produced images with the following tags:
mallardduck/prometheus-federator:reorg-v0.3.3
mallardduck/prometheus-federator:reorg-v0.3.4
mallardduck/prometheus-federator:reorg-v0.4.0
mallardduck/prometheus-federator:reorg-v0.4.1
mallardduck/prometheus-federator:reorg-v0.4.2
- a rebuild/retag of the reorg-demo
image used by prom-reorg-test
branch chart by default.Each version suffix correlates to the version of the rancher-project-monitoring
chart that is embedded into the prometheus-federator
binary/image.
In order to test images individually without resetting everything, it's possible to edit the prometheus-federator
App.
.helmProjectOperator.image.tag
to match desired target.
Rancher Server Setup
Information about the Cluster
User Information
Describe the bug While the installing Prometheus Federator 105.0.0-rc.1+up0.4.2 on Rancher 2.10-head via Docker I encountered a problem when setting up the project monitor. The setup process is stuck with the status "WaitingForDashboardValues"
Additionally, the corresponding pod, helm-install-cattle-project-p-7mj9v-monitoring-jwr97, is throwing the following
Reorg the helm-locker repo to merge into helm-project-operator repo #94
error:
Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: Namespace "cattle-dashboards" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "cattle-project-p-7mj9v-monitoring": current value is "rancher-monitoring"; annotation validation error: key "meta.helm.sh/release-namespace" must equal "cattle-project-p-7mj9v-monitoring": current value is "cattle-monitoring-system"
Details
To Reproduce Steps:
Result Project monitor setup process got stuck with the status "WaitingForDashboardValues"
Expected Result project monitor should be created successfully.
Screenshots