syself / cluster-api-provider-hetzner

Cluster API Provider Hetzner :rocket: The best way to manage Kubernetes clusters on Hetzner, fully declarative, Kubernetes-native and with self-healing capabilities
https://caph.syself.com
Apache License 2.0
595 stars 57 forks source link

Panic in CAPH controller #1388

Closed EternalDeiwos closed 1 month ago

EternalDeiwos commented 1 month ago

/kind bug

What steps did you take and what happened:

Application panic in normal use.

I was repeatedly testing a script to deploy a cluster and discovered that the Cluster.cluster.x-k8s.io/v1beta1 failed to delete. Removing the finalizers allowed the record to be deleted and the infra provider resumed normal operation. Not sure why the cluster record was unable to be deleted in the first place.

What did you expect to happen:

The application should not panic

Anything else you would like to add:

The controller runs in a restart loop consistently showing the following error message.

Logs ``` {"level":"INFO","time":"2024-07-18T09:46:00.106Z","file":"controller/controller.go:186","message":"Starting Controller","controller":"certificatesigningrequest","controllerGroup":"certificates.k8s.io","controllerKind":"CertificateSigningRequest"} {"level":"INFO","time":"2024-07-18T09:46:00.134Z","file":"controller/controller.go:115","message":"Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference","controller":"hetznercluster","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerCluster","HetznerCluster":{"name":"beta","namespace":"beta"},"namespace":"beta","name":"beta","reconcileID":"04d90b63-ae3f-4e6a-8880-b141be30d935"} panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x183d19d] goroutine 397 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:116 +0x1e5 panic({0x1a71140?, 0x2e98890?}) runtime/panic.go:914 +0x21f github.com/syself/cluster-api-provider-hetzner/pkg/services/hcloud/loadbalancer.(*Service).Delete(0xc00099d800, {0x1fe4180, 0xc000df99e0}) github.com/syself/cluster-api-provider-hetzner/pkg/services/hcloud/loadbalancer/loadbalancer.go:351 +0x7d github.com/syself/cluster-api-provider-hetzner/controllers.(*HetznerClusterReconciler).reconcileDelete(0xc00021c900, {0x1fe4180, 0xc000df99e0}, 0xc0002ef5e0) github.com/syself/cluster-api-provider-hetzner/controllers/hetznercluster_controller.go:337 +0x2c6 github.com/syself/cluster-api-provider-hetzner/controllers.(*HetznerClusterReconciler).Reconcile(0xc00021c900, {0x1fe4180, 0xc000df9770}, {{{0xc000604240?, 0x5?}, {0xc00060423c?, 0xc00099dd08?}}}) github.com/syself/cluster-api-provider-hetzner/controllers/hetznercluster_controller.go:166 +0xa56 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1fe75b0?, {0x1fe4180?, 0xc000df9770?}, {{{0xc000604240?, 0xb?}, {0xc00060423c?, 0x0?}}}) sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:119 +0xb7 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0002f2780, {0x1fe41b8, 0xc00012d0e0}, {0x1b41740?, 0xc000278da0?}) sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:316 +0x3cc sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002f2780, {0x1fe41b8, 0xc00012d0e0}) sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266 +0x1af sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227 +0x79 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 168 sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:223 +0x565 ```

Environment:

janiskemper commented 1 month ago

thanks for reporting this @EternalDeiwos !

One comment: It is not a good idea to intervene in the process of deleting a cluster. This might lead to orphaned resources. I guess that in this case you protected your load balancer via Hetzner UI and we apparently have a bug for deleting (or better for NOT deleting) protected load balancers.

@yrs147 can you take this over? You can use the function "findLoadBalancer" (https://github.com/syself/cluster-api-provider-hetzner/blob/3014bef4195e3fc0b673c3240a3399c095c1a588/pkg/services/hcloud/loadbalancer/loadbalancer.go#L418-L419) instead of using the name of the load balancer in https://github.com/syself/cluster-api-provider-hetzner/blob/3014bef4195e3fc0b673c3240a3399c095c1a588/pkg/services/hcloud/loadbalancer/loadbalancer.go#L351-L352 to fetch it.

The problem is that the name does not exist if you don't use an existing load balancer and protect it via Hetzner UI.