siderolabs / omni

SaaS-simple deployment of Kubernetes - on your own hardware.
Other
397 stars 23 forks source link

[bug] use of kube-prometheus-stack helm template in cluster template break Omni #180

Open bernardgut opened 2 months ago

bernardgut commented 2 months ago

UPDATE: READ THE COMMENT BELOW FIRST

Is there an existing issue for this?

Current Behavior

If you create a cluster using omnictl cluster template sync with a machine class and machine labels:

The nodes are stuck forever with : image and whenever they recieve a new command they will print image

Expected Behavior

provision the cluster based on the selected nodes machineClass

Steps To Reproduce

  1. On clean Omni, generate ISO with both amd extensions, drbd and zfs. boot the 3 machines
  2. Once the machines have joined Omni, add bootstrap patch with hostname, add bootstrap2 patch with basic extension config and certificate rotation
  3. in the Machine menu, add a label o0.
  4. Go to Machine classes, create a machineClass o0 with filter o0
  5. create a template with, amongst other patches and configurations, the following
    machineClass:
    name: o0
    size: 3
  6. run omnictl cluster template sync --file o0

watch your machines burn.

What browsers are you seeing the problem on?

No response

Anything else?

tested : happens in both omni 0.33 and 0.34. Talos 1.7.0

bernardgut commented 2 months ago

actually after investigating the issue further this is not due to the label facility. This is due to the kube-prometheus-stack

helm template --include-crds -n monitoring -f apps/kube-prometheus-stack.helm.yaml kps prometheus-community/kube-prometheus-stack --create-namespace | yq -i 'with(.cluster.inlineManifests.[] | select(.name=="monitoring-stack"); .contents=load_str("/dev/stdin"))' patches/monitoring-stack.yaml

then adding that patch to your cluster template.The missing pre/post hooks (which are not included in helm template .., as opposed to helm install ... somehow break the cluster nodes (before the install even starts). It doesnt matter what you put in the helm values I think. But if you need a sample I can provide.

That is as far as I managed to debug this so I disabled the patch and it works. I edited this bug to reflect this but if you feel it is out of scope for Omni feel free to close it.