loft-sh / vcluster

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.
https://www.vcluster.com
Apache License 2.0
6.92k stars 427 forks source link

Panic applying virtual status patch #2293

Open yankcrime opened 3 days ago

yankcrime commented 3 days ago

What happened?

I'm seeing vCluster panic when it's applying a virtual status patch as a result of updates to a LoadBalancer Service in a virtual cluster. The output is as follows:

2024-11-22 12:48:08     INFO    patcher/apply.go:313    Apply virtual patch     {"component": "vcluster", "controller": "service", "namespace": "cluster", "name": "kamaji-cluster", "reconcileID": "ca91da8b-dde2-462d-8233-cf4a3826a161", "kind": "Service", "object": "cluster/kamaji-cluster", "patch": "{\"metadata\":{\"annotations\":{\"loadbalancer.openstack.org/load-balancer-address\":\"193.16.42.10\",\"loadbalancer.openstack.org/load-balancer-id\":\"2239ddd9-1d22-4ff7-879e-a94e2278c45a\"}}}"}
2024-11-22 12:48:08     INFO    patcher/apply.go:313    Apply virtual status patch      {"component": "vcluster", "controller": "service", "namespace": "cluster", "name": "kamaji-cluster", "reconcileID": "4aeeb2ac-0f77-4a25-a3e2-8d4919b54ffc", "kind": "Service", "object": "cluster/kamaji-cluster", "patch": "{\"status\":{\"loadBalancer\":{\"ingress\":[{\"ip\":\"193.16.42.10\",\"ipMode\":\"VIP\"}]}}}"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      Observed a panic        {"component": "vcluster", "component": "controller-manager", "location": "panic.go:261", "panic": "runtime error: invalid memory address or nil pointer dereference", "panicGoValue": "\"invalid memory address or nil pointer dereference\"", "stacktrace": "<"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      goroutine 972 [running]:        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.logPanic({0x38483f0, 0x554eb20}, {0x2d9c260, 0x54897b0})   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x38483f0, 0x554eb20}, {0x2d9c260, 0x54897b0}, {0x554eb20, 0x0, 0x43d945?})   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:82 +0x5e        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0019e5dc0?})     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x108       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      panic({0x2d9c260?, 0x54897b0?}) {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      runtime/panic.go:770 +0x132     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service.(*Controller).needsUpdate(0xc000d49a00, 0xc002202008, 0xc002027688)   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service/controller.go:581 +0x4b8      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service.New.func2({0x32711e0?, 0xc002202008?}, {0x32711e0, 0xc002027688?})    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service/controller.go:144 +0x74       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/controller.go:253  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    patcher/apply.go:313    Apply host patch        {"component": "vcluster", "controller": "service", "namespace": "cluster", "name": "kamaji-cluster", "reconcileID": "e32f57b2-f56a-4fcb-ba94-4e347e2dd067", "kind": "Service", "object": "kamaji/kamaji-cluster-x-cluster-x-clustermgr0", "patch": "{\"spec\":{\"loadBalancerIP\":\"193.16.42.10\"}}"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.(*processorListener).run.func1()   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/shared_informer.go:976 +0xea       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0019f7f70, {0x38135c0, 0xc0019cb530}, 0x1, 0xc001997e60)      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001cea770, 0x3b9aca00, 0x0, 0x1, 0xc001997e60) {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.Until(...)    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:161        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.(*processorListener).run(0xc000d07200)     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 833    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      >       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      panic: runtime error: invalid memory address or nil pointer dereference [recovered]     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      panic: runtime error: invalid memory address or nil pointer dereference {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x211fc38]        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      goroutine 972 [running]:        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x38483f0, 0x554eb20}, {0x2d9c260, 0x54897b0}, {0x554eb20, 0x0, 0x43d945?})   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:89 +0xee        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0019e5dc0?})     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x108       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      panic({0x2d9c260?, 0x54897b0?}) {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      runtime/panic.go:770 +0x132     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service.(*Controller).needsUpdate(0xc000d49a00, 0xc002202008, 0xc002027688)   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service/controller.go:581 +0x4b8      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service.New.func2({0x32711e0?, 0xc002202008?}, {0x32711e0, 0xc002027688?})    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/cloud-provider/controllers/service/controller.go:144 +0x74       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/controller.go:253  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.(*processorListener).run.func1()   {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/shared_informer.go:976 +0xea       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc002571f70, {0x38135c0, 0xc0019cb530}, 0x1, 0xc001997e60)      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001cea770, 0x3b9aca00, 0x0, 0x1, 0xc001997e60) {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f  {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.Until(...)    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/backoff.go:161        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache.(*processorListener).run(0xc000d07200)     {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69       {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()        {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 833    {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:08     INFO    commandwriter/commandwriter.go:128      k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73      {"component": "vcluster", "component": "controller-manager"}
2024-11-22 12:48:12     INFO    commandwriter/commandwriter.go:128      Failed to list /registry/ipam.cluster.x-k8s.io/ipaddresses/ for revision 204148: rpc error: code = OutOfRange desc = etcdserver: mvcc: required revision has been compacted       {"component": "vcluster", "component": "kine", "time": "2024-11-22T12:48:12.207567635Z", "level": "error"}

At this point no resources in the virtual cluster are synchronised, however if I kill the Pod for this virtual cluster then eventually everything restarts and resources start synchronising again.

What did you expect to happen?

I do not expect vCluster to panic, at worst an error but a yield so that resources continue to synchronise so that further actions in the virtual cluster aren't blocked.

How can we reproduce it (as minimally and precisely as possible)?

This is being triggered when creating a Kamaji TenantControlPlane resource in my virtual cluster. As part of this resource's creation it creates a Service of type LoadBalancer, and once this has been instantiated that's when I see the panic in vCluster. It's consistent and repeatable albeit a bit involved, happy to provide more details to assist with troubleshooting if it's not clear from the trace.

Anything else we need to know?

No response

Host cluster Kubernetes version

```console $ kubectl version Client Version: v1.30.5 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.31.2 ```

vcluster version

```console $ vcluster --version vcluster version 0.21.1 ```

VCluster Config

Nothing special, the virtual cluster was created with the following command: ``` vcluster create clustermgr0 -n kamaji --expose ```
FabianKramm commented 3 days ago

@yankcrime thanks for reporting this! Strange, the panic actually occurs in the Kubernetes controller-manager itself, this might be a Kubernetes problem or something that has to do with our configuration

yankcrime commented 3 days ago

Thanks @FabianKramm - it's an interesting one for sure. Let me know if I can provide you with any more details 👍

yankcrime commented 15 hours ago

An additional datapoint: I tested this today with an older version of vCluster - v0.19.6 - and didn't experience the same panic.