ionos-cloud / cluster-api-provider-proxmox

Cluster API Provider for Proxmox VE (CAPMOX)
Apache License 2.0
181 stars 24 forks source link

Ciliums CRS causis capmox to panic #255

Closed justinas-b closed 1 month ago

justinas-b commented 2 months ago

What steps did you take and what happened:

If i use --flavor cilium when generating new cluster, ProxmoxMachineTemplate resources get misconfigured:

❯ clusterctl generate cluster mgmt --flavor cilium --infrastructure proxmox --kubernetes-version v1.30.2 --control-plane-machine-count 1 --worker-machine-count 1 | grep -B2 -A11 -e "^kind: ProxmoxMachineTemplate"
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxMachineTemplate
metadata:
  name: mgmt-control-plane
  namespace: default
spec:
  template:
    spec:
      format: qcow2
      full: true
      sourceNode: pve-01
      templateID: 8889
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxMachineTemplate
metadata:
  name: mgmt-worker
  namespace: default
spec:
  template:
    spec:
      format: qcow2
      full: true
      sourceNode: pve-01
      templateID: 8889
---

It seems that these resources are missing spec.template.spec.disks.bootVolume key, which is precent if no flavour is set or if --flavor calico is used:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxMachineTemplate
metadata:
  name: mgmt-worker
  namespace: default
spec:
  template:
    spec:
      disks:
        bootVolume:
          disk: scsi0
          sizeGb: 50

This causes below panic to occur:

2024/07/04 12:52:10 http: panic serving 10.244.0.1:6328: runtime error: invalid memory address or nil pointer dereference
goroutine 316 [running]:
net/http.(*conn).serve.func1()
    /usr/local/go/src/net/http/server.go:1868 +0xb9
panic({0x17c55c0?, 0x29cc7f0?})
    /usr/local/go/src/runtime/panic.go:920 +0x270
github.com/ionos-cloud/cluster-api-provider-proxmox/internal/webhook.validateNetworks(0xc000226b40)
    /workspace/internal/webhook/proxmoxmachine_webhook.go:89 +0xab
github.com/ionos-cloud/cluster-api-provider-proxmox/internal/webhook.(*ProxmoxMachine).ValidateCreate(0xc0004cbf50?, {0x24?, 0xc00050c980?}, {0x1c72290?, 0xc000226b40?})
    /workspace/internal/webhook/proxmoxmachine_webhook.go:56 +0x125
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatorForType).Handle(_, {_, _}, {{{0xc0004cbf50, 0x24}, {{0xc00050c980, 0x1f}, {0xc0006afc50, 0x8}, {0xc0006afc60, ...}}, ...}})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/webhook/admission/validator_custom.go:91 +0x2b6
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc0004cbf50, 0x24}, {{0xc00050c980, 0x1f}, {0xc0006afc50, 0x8}, {0xc0006afc60, ...}}, ...}})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/webhook/admission/webhook.go:169 +0x1ed
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc000380eb0, {0x7fffb91410a0?, 0xc00035f540}, 0xc000197900)
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/webhook/admission/http.go:119 +0xeb0
sigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1({0x7fffb91410a0, 0xc00035f540}, 0x1c77a00?)
    /go/pkg/mod/github.com/prometheus/client_golang@v1.18.0/prometheus/promhttp/instrument_server.go:60 +0xcb
net/http.HandlerFunc.ServeHTTP(0x1c77a40?, {0x7fffb91410a0?, 0xc00035f540?}, 0xc0002b58a0?)
    /usr/local/go/src/net/http/server.go:2136 +0x29
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1c77a40?, 0xc00069c9a0?}, 0xc000197900)
    /go/pkg/mod/github.com/prometheus/client_golang@v1.18.0/prometheus/promhttp/instrument_server.go:147 +0xb6
net/http.HandlerFunc.ServeHTTP(0x6e3586?, {0x1c77a40?, 0xc00069c9a0?}, 0x410945?)
    /usr/local/go/src/net/http/server.go:2136 +0x29
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1c77a40, 0xc00069c9a0}, 0xc000197900)
    /go/pkg/mod/github.com/prometheus/client_golang@v1.18.0/prometheus/promhttp/instrument_server.go:109 +0xc2
net/http.HandlerFunc.ServeHTTP(0x10?, {0x1c77a40?, 0xc00069c9a0?}, 0xc0007407ec?)
    /usr/local/go/src/net/http/server.go:2136 +0x29
net/http.(*ServeMux).ServeHTTP(0x410945?, {0x1c77a40, 0xc00069c9a0}, 0xc000197900)
    /usr/local/go/src/net/http/server.go:2514 +0x142
net/http.serverHandler.ServeHTTP({0x1c71598?}, {0x1c77a40?, 0xc00069c9a0?}, 0x6?)
    /usr/local/go/src/net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc00055afc0, {0x1c85440, 0xc00041e360})
    /usr/local/go/src/net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 93
    /usr/local/go/src/net/http/server.go:3086 +0x5cb
65278 commented 2 months ago

Going by the code of the webhook, the likely culprit is a missing Network spec in your proxmoxmachinetemplate. Indeed, that's a bug (actually two bugs), we should check network to not be null before dereferencing it in the webhook, and we should ship a complete template.

mcbenjemaa commented 1 month ago

I have written the fix for this: https://github.com/ionos-cloud/cluster-api-provider-proxmox/pull/264