Closed shibumi closed 3 years ago
Hi, while bootstrapping new clusters with Kubermatic version 1.16 I found the following bug with the machine-controller in customer-clusters:
kubectl describe pod/machine-controller-..:
kubectl describe pod/machine-controller-..
Name: machine-controller-7584cf7c44-tmzvn Namespace: cluster-tsq85kvtfr Priority: 0 Node: master-pool1-6b9c9755b8-7726s/192.168.0.8 Start Time: Tue, 01 Jun 2021 11:15:33 +0200 Labels: app=machine-controller cluster=tsq85kvtfr machinecontroller-kubeconfig-secret-revision=25584 pod-template-hash=7584cf7c44 Annotations: cni.projectcalico.org/podIP: 10.244.4.19/32 cni.projectcalico.org/podIPs: 10.244.4.19/32 prometheus.io/path: /metrics prometheus.io/port: 8080 prometheus.io/scrape: true Status: Running IP: 10.244.4.19 IPs: IP: 10.244.4.19 Controlled By: ReplicaSet/machine-controller-7584cf7c44 Init Containers: copy-http-prober: Container ID: docker://aba3fb536e0101bcf1fdba44a3951b7ba24aeb948fd5f3b6dc69646cc284f1bd Image: quay.io/kubermatic/http-prober:v0.3.1 Image ID: docker-pullable://quay.io/kubermatic/http-prober@sha256:1dfe3b3eedcf35f37bcc0d483847eff00337c57ee7d963236ba3e5f6d643c47e Port: <none> Host Port: <none> Command: /bin/cp /usr/local/bin/http-prober /http-prober-bin/http-prober State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 01 Jun 2021 11:15:34 +0200 Finished: Tue, 01 Jun 2021 11:15:34 +0200 Ready: True Restart Count: 0 Environment: <none> Mounts: /http-prober-bin from http-prober-bin (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-sw4sl (ro) Containers: machine-controller: Container ID: docker://2345b8e52d6620e2f79154b523cec84953de576e4b5fd3615e6dec308f5a6b1c Image: docker.io/kubermatic/machine-controller:v1.24.4 Image ID: docker-pullable://kubermatic/machine-controller@sha256:ecfe580aa664370e246d33592aa22a2b4878c926a62ba0d90e489ca36e5c0129 Port: <none> Host Port: <none> Command: /http-prober-bin/http-prober Args: -endpoint https://apiserver-external.cluster-tsq85kvtfr.svc.cluster.local./healthz -insecure -retries 100 -retry-wait 2 -timeout 1 -command {"command":"/usr/local/bin/machine-controller","args":["-kubeconfig","/etc/kubernetes/kubeconfig/kubeconfig","-logtostderr","-v","4","-cluster-dns","169.254.20.10","-health-probe-address","0.0.0.0:8085","-metrics-address","0.0.0.0:8080"]} --crd-to-wait-for Machine,cluster.k8s.io/v1alpha1 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Tue, 01 Jun 2021 11:25:07 +0200 Finished: Tue, 01 Jun 2021 11:25:41 +0200 Ready: False Restart Count: 6 Limits: cpu: 2 memory: 512Mi Requests: cpu: 25m memory: 32Mi Liveness: http-get http://:8085/readyz delay=15s timeout=15s period=10s #success=1 #failure=3 Environment: HZ_TOKEN: MpuX4GCmRLAg2ThelxcXYpMrDRuzUVp6exHsCrEu1eEdjkRwuq9aVg6PEsWbzh6f KUBECONFIG: /etc/kubernetes/kubeconfig/kubeconfig Mounts: /etc/kubernetes/kubeconfig from machinecontroller-kubeconfig (ro) /http-prober-bin from http-prober-bin (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-sw4sl (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: machinecontroller-kubeconfig: Type: Secret (a volume populated by a Secret) SecretName: machinecontroller-kubeconfig Optional: false http-prober-bin: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> default-token-sw4sl: Type: Secret (a volume populated by a Secret) SecretName: default-token-sw4sl Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11m default-scheduler Successfully assigned cluster-tsq85kvtfr/machine-controller-7584cf7c44-tmzvn to master-pool1-6b9c9755b8-7726s Normal Pulled 11m kubelet Container image "quay.io/kubermatic/http-prober:v0.3.1" already present on machine Normal Created 11m kubelet Created container copy-http-prober Normal Started 11m kubelet Started container copy-http-prober Normal Pulling 11m kubelet Pulling image "docker.io/kubermatic/machine-controller:v1.24.4" Normal Pulled 11m kubelet Successfully pulled image "docker.io/kubermatic/machine-controller:v1.24.4" in 6.414382595s Warning Unhealthy 8m36s (x3 over 10m) kubelet Liveness probe failed: Get "http://10.244.4.19:8085/readyz": dial tcp 10.244.4.19:8085: connect: connection refused Normal Pulled 8m14s (x3 over 10m) kubelet Container image "docker.io/kubermatic/machine-controller:v1.24.4" already present on machine Normal Created 8m13s (x4 over 11m) kubelet Created container machine-controller Normal Started 8m13s (x4 over 11m) kubelet Started container machine-controller Warning BackOff 62s (x30 over 9m27s) kubelet Back-off restarting failed container
{"level":"info","time":"2021-06-01T09:21:53.133Z","logger":"http-prober","caller":"http-prober/main.go:109","msg":"Probing","attempt":1,"max-attempts":100,"target":"https://apiserver-external.cluster-tsq85kvtfr.svc.cluster.local./healthz"} {"level":"info","time":"2021-06-01T09:21:53.146Z","logger":"http-prober","caller":"http-prober/main.go:98","msg":"Hostname resolved","hostname":"apiserver-external.cluster-tsq85kvtfr.svc.cluster.local.","address":"10.99.157.188:443"} {"level":"info","time":"2021-06-01T09:21:53.159Z","logger":"http-prober","caller":"http-prober/main.go:122","msg":"Endpoint is available"} {"level":"info","time":"2021-06-01T09:21:53.242Z","logger":"http-prober","caller":"http-prober/main.go:129","msg":"All CRDs became available"} I0601 09:21:53.381329 1 leaderelection.go:243] attempting to acquire leader lease kube-system/machine-controller... I0601 09:22:09.089050 1 leaderelection.go:253] successfully acquired lease kube-system/machine-controller W0601 09:22:24.098933 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition W0601 09:22:24.110357 1 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition I0601 09:22:24.191440 1 migrations.go:175] CRD machines.machine.k8s.io not present, no migration needed I0601 09:22:24.191520 1 migrations.go:54] Starting to migrate providerConfigs to providerSpecs I0601 09:22:24.234968 1 migrations.go:136] Successfully migrated providerConfigs to providerSpecs I0601 09:22:24.236644 1 plugin.go:95] looking for plugin "machine-controller-userdata-centos" I0601 09:22:24.236836 1 plugin.go:123] checking "/usr/local/bin/machine-controller-userdata-centos" I0601 09:22:24.237372 1 plugin.go:136] found '/usr/local/bin/machine-controller-userdata-centos' I0601 09:22:24.237409 1 plugin.go:95] looking for plugin "machine-controller-userdata-coreos" I0601 09:22:24.237454 1 plugin.go:123] checking "/usr/local/bin/machine-controller-userdata-coreos" I0601 09:22:24.237510 1 plugin.go:136] found '/usr/local/bin/machine-controller-userdata-coreos' I0601 09:22:24.237526 1 plugin.go:95] looking for plugin "machine-controller-userdata-ubuntu" I0601 09:22:24.237559 1 plugin.go:123] checking "/usr/local/bin/machine-controller-userdata-ubuntu" I0601 09:22:24.237597 1 plugin.go:136] found '/usr/local/bin/machine-controller-userdata-ubuntu' I0601 09:22:24.237609 1 plugin.go:95] looking for plugin "machine-controller-userdata-sles" I0601 09:22:24.237640 1 plugin.go:123] checking "/usr/local/bin/machine-controller-userdata-sles" I0601 09:22:24.237695 1 plugin.go:136] found '/usr/local/bin/machine-controller-userdata-sles' I0601 09:22:24.237729 1 plugin.go:95] looking for plugin "machine-controller-userdata-rhel" I0601 09:22:24.237778 1 plugin.go:123] checking "/usr/local/bin/machine-controller-userdata-rhel" I0601 09:22:24.237817 1 plugin.go:136] found '/usr/local/bin/machine-controller-userdata-rhel' I0601 09:22:24.237830 1 plugin.go:95] looking for plugin "machine-controller-userdata-flatcar" I0601 09:22:24.237867 1 plugin.go:123] checking "/usr/local/bin/machine-controller-userdata-flatcar" I0601 09:22:24.237939 1 plugin.go:136] found '/usr/local/bin/machine-controller-userdata-flatcar' I0601 09:22:24.238958 1 main.go:423] machine controller startup complete I0601 09:22:24.340311 1 machineset_controller.go:148] Reconcile machineset cluster-2-worker-ck4rn5-645786bddf I0601 09:22:24.340960 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-jvqvg, machine has no node ref I0601 09:22:24.341108 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-7tl2d, machine has no node ref I0601 09:22:24.341125 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-cl28s, machine has no node ref I0601 09:22:24.341418 1 machineset_controller.go:148] Reconcile machineset cluster-2-worker-ck4rn5-645786bddf I0601 09:22:24.341656 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-jvqvg, machine has no node ref I0601 09:22:24.341757 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-7tl2d, machine has no node ref I0601 09:22:24.341774 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-cl28s, machine has no node ref I0601 09:22:24.342464 1 machineset_controller.go:148] Reconcile machineset cluster-2-worker-ck4rn5-645786bddf I0601 09:22:24.342598 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-jvqvg, machine has no node ref I0601 09:22:24.342637 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-7tl2d, machine has no node ref I0601 09:22:24.342647 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-cl28s, machine has no node ref I0601 09:22:24.342683 1 machineset_controller.go:148] Reconcile machineset cluster-2-worker-ck4rn5-645786bddf I0601 09:22:24.342785 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-jvqvg, machine has no node ref I0601 09:22:24.342809 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-7tl2d, machine has no node ref I0601 09:22:24.342818 1 status.go:56] Unable to get node for machine cluster-2-worker-ck4rn5-645786bddf-cl28s, machine has no node ref I0601 09:22:24.630353 1 machine_controller.go:675] Validated machine spec of cluster-2-worker-ck4rn5-645786bddf-cl28s I0601 09:22:24.652443 1 machine_controller.go:675] Validated machine spec of cluster-2-worker-ck4rn5-645786bddf-jvqvg I0601 09:22:24.674182 1 machine_controller.go:675] Validated machine spec of cluster-2-worker-ck4rn5-645786bddf-7tl2d E0601 09:22:26.770451 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 389 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x2e9fac0, 0x5595020) k8s.io/apimachinery@v0.19.4/pkg/util/runtime/runtime.go:74 +0xa6 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) k8s.io/apimachinery@v0.19.4/pkg/util/runtime/runtime.go:48 +0x89 panic(0x2e9fac0, 0x5595020) runtime/panic.go:969 +0x175 github.com/hetznercloud/hcloud-go/hcloud.(*ServerClient).Create(0xc0009302b0, 0x3a90a60, 0xc000116008, 0xc000c98ff0, 0x28, 0xc000ac2400, 0xc001204680, 0xc0002f0870, 0x1, 0x1, ...) github.com/hetznercloud/hcloud-go@v1.23.1/hcloud/server.go:306 +0x409 github.com/kubermatic/machine-controller/pkg/cloudprovider/provider/hetzner.(*provider).Create(0xc0002f1bc8, 0xc000d73200, 0xc000974f60, 0xc001212000, 0x35af, 0x0, 0x0, 0x0, 0x0) github.com/kubermatic/machine-controller/pkg/cloudprovider/provider/hetzner/provider.go:284 +0x8b8 github.com/kubermatic/machine-controller/pkg/cloudprovider.(*cachingValidationWrapper).Create(0xc000d66450, 0xc000d73200, 0xc000974f60, 0xc001212000, 0x35af, 0xc000aba2c0, 0x0, 0x0, 0x0) github.com/kubermatic/machine-controller/pkg/cloudprovider/validationwrapper.go:77 +0x5f github.com/kubermatic/machine-controller/pkg/controller/machine.(*Reconciler).createProviderInstance(0xc00009cb60, 0x3ab8ea0, 0xc000d66450, 0xc000d73200, 0xc001212000, 0x35af, 0x0, 0x0, 0x0, 0x0) github.com/kubermatic/machine-controller/pkg/controller/machine/machine_controller.go:332 +0x137 github.com/kubermatic/machine-controller/pkg/controller/machine.(*Reconciler).ensureInstanceExistsForMachine(0xc00009cb60, 0x3a90ae0, 0xc000d6a600, 0x3ab8ea0, 0xc000d66450, 0xc000d73200, 0x3a31620, 0xc0008fb900, 0xc000139ea0, 0x0, ...) github.com/kubermatic/machine-controller/pkg/controller/machine/machine_controller.go:709 +0x89d github.com/kubermatic/machine-controller/pkg/controller/machine.(*Reconciler).reconcile(0xc00009cb60, 0x3a90ae0, 0xc000d6a600, 0xc000d73200, 0x55f7740, 0xc000c98f30, 0x28) github.com/kubermatic/machine-controller/pkg/controller/machine/machine_controller.go:406 +0x7ac github.com/kubermatic/machine-controller/pkg/controller/machine.(*Reconciler).Reconcile(0xc00009cb60, 0x3a90ae0, 0xc000d6a600, 0xc000c7f120, 0xb, 0xc000c98f30, 0x28, 0xc000d6a600, 0x40a3ff, 0xc00003a000, ...) github.com/kubermatic/machine-controller/pkg/controller/machine/machine_controller.go:360 +0x5e8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003dfa40, 0x3a90a20, 0xc0007db540, 0x3038020, 0xc000ce53e0) sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:263 +0x317 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003dfa40, 0x3a90a20, 0xc0007db540, 0xc000069e00) sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:235 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x3a90a20, 0xc0007db540) sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:198 +0x4a k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1() k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:185 +0x37 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000069f50) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000dc7f50, 0x3a37080, 0xc000d6a5d0, 0xc0007db501, 0xc0005e19e0) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:156 +0xad k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000069f50, 0x3b9aca00, 0x0, 0x1, 0xc0005e19e0) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x3a90a20, 0xc0007db540, 0xc000dba0c0, 0x3b9aca00, 0x0, 0x1) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:185 +0xa6 k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x3a90a20, 0xc0007db540, 0xc000dba0c0, 0x3b9aca00) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:99 +0x57 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:195 +0x4e7 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x22dca89] goroutine 389 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) k8s.io/apimachinery@v0.19.4/pkg/util/runtime/runtime.go:55 +0x10c panic(0x2e9fac0, 0x5595020) runtime/panic.go:969 +0x175 github.com/hetznercloud/hcloud-go/hcloud.(*ServerClient).Create(0xc0009302b0, 0x3a90a60, 0xc000116008, 0xc000c98ff0, 0x28, 0xc000ac2400, 0xc001204680, 0xc0002f0870, 0x1, 0x1, ...) github.com/hetznercloud/hcloud-go@v1.23.1/hcloud/server.go:306 +0x409 github.com/kubermatic/machine-controller/pkg/cloudprovider/provider/hetzner.(*provider).Create(0xc0002f1bc8, 0xc000d73200, 0xc000974f60, 0xc001212000, 0x35af, 0x0, 0x0, 0x0, 0x0) github.com/kubermatic/machine-controller/pkg/cloudprovider/provider/hetzner/provider.go:284 +0x8b8 github.com/kubermatic/machine-controller/pkg/cloudprovider.(*cachingValidationWrapper).Create(0xc000d66450, 0xc000d73200, 0xc000974f60, 0xc001212000, 0x35af, 0xc000aba2c0, 0x0, 0x0, 0x0) github.com/kubermatic/machine-controller/pkg/cloudprovider/validationwrapper.go:77 +0x5f github.com/kubermatic/machine-controller/pkg/controller/machine.(*Reconciler).createProviderInstance(0xc00009cb60, 0x3ab8ea0, 0xc000d66450, 0xc000d73200, 0xc001212000, 0x35af, 0x0, 0x0, 0x0, 0x0) github.com/kubermatic/machine-controller/pkg/controller/machine/machine_controller.go:332 +0x137 github.com/kubermatic/machine-controller/pkg/controller/machine.(*Reconciler).ensureInstanceExistsForMachine(0xc00009cb60, 0x3a90ae0, 0xc000d6a600, 0x3ab8ea0, 0xc000d66450, 0xc000d73200, 0x3a31620, 0xc0008fb900, 0xc000139ea0, 0x0, ...) github.com/kubermatic/machine-controller/pkg/controller/machine/machine_controller.go:709 +0x89d github.com/kubermatic/machine-controller/pkg/controller/machine.(*Reconciler).reconcile(0xc00009cb60, 0x3a90ae0, 0xc000d6a600, 0xc000d73200, 0x55f7740, 0xc000c98f30, 0x28) github.com/kubermatic/machine-controller/pkg/controller/machine/machine_controller.go:406 +0x7ac github.com/kubermatic/machine-controller/pkg/controller/machine.(*Reconciler).Reconcile(0xc00009cb60, 0x3a90ae0, 0xc000d6a600, 0xc000c7f120, 0xb, 0xc000c98f30, 0x28, 0xc000d6a600, 0x40a3ff, 0xc00003a000, ...) github.com/kubermatic/machine-controller/pkg/controller/machine/machine_controller.go:360 +0x5e8 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003dfa40, 0x3a90a20, 0xc0007db540, 0x3038020, 0xc000ce53e0) sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:263 +0x317 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003dfa40, 0x3a90a20, 0xc0007db540, 0xc000069e00) sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:235 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x3a90a20, 0xc0007db540) sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:198 +0x4a k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1() k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:185 +0x37 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000069f50) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000dc7f50, 0x3a37080, 0xc000d6a5d0, 0xc0007db501, 0xc0005e19e0) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:156 +0xad k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000069f50, 0x3b9aca00, 0x0, 0x1, 0xc0005e19e0) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x3a90a20, 0xc0007db540, 0xc000dba0c0, 0x3b9aca00, 0x0, 0x1) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:185 +0xa6 k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x3a90a20, 0xc0007db540, 0xc000dba0c0, 0x3b9aca00) k8s.io/apimachinery@v0.19.4/pkg/util/wait/wait.go:99 +0x57 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:195 +0x4e7
duplicate of #943
Hi, while bootstrapping new clusters with Kubermatic version 1.16 I found the following bug with the machine-controller in customer-clusters:
kubectl describe pod/machine-controller-..
: