kodekloudhub / certified-kubernetes-administrator-course

Certified Kubernetes Administrator - CKA Course
7.96k stars 7.63k forks source link

kube-controller-manager-kubemaster, kube-scheduler-kubemaster and kube-apiserver-kubemaster failling after fresh install #150

Closed JobMendes closed 9 months ago

JobMendes commented 9 months ago

Hi.

I've followed the steps given into page also aligned with the video from course.

And after we perform the kubeadm init with the parameters mentioned, the custer did not gets stable and resources crash.

image

vagrant@kubemaster:~$ kubectl logs kube-controller-manager-kubemaster -n kube-system

_I0201 22:23:51.509281 1 serving.go:348] Generated self-signed cert in-memory I0201 22:23:52.452616 1 controllermanager.go:189] "Starting" version="v1.28.6" I0201 22:23:52.452692 1 controllermanager.go:191] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" I0201 22:23:52.456475 1 secure_serving.go:213] Serving securely on 127.0.0.1:10257 I0201 22:23:52.456779 1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/etc/kubernetes/pki/front-proxy-ca.crt" I0201 22:23:52.457621 1 dynamic_cafilecontent.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt" I0201 22:23:52.458048 1 leaderelection.go:250] attempting to acquire leader lease kube-system/kube-controller-manager...

_I0201 22:23:52.458526 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" E0201 22:23:53.140435 1 leaderelection.go:332] error retrieving resource lock kube-system/kube-controller-manager: leases.coordination.k8s.io "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system" I0201 22:24:12.687007 1 leaderelection.go:260] successfully acquired lease kube-system/kube-controller-manager I0201 22:24:12.689194 1 event.go:307] "Event occurred" object="kube-system/kube-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="kubemaster_750b6d10-4871-4b0a-bcb8-71d780d4ad17 became leader" I0201 22:24:12.701894 1 shared_informer.go:311] Waiting for caches to sync for tokens I0201 22:24:12.721898 1 resource_quota_monitor.go:224] "QuotaMonitor created object count evaluator" resource="serviceaccounts" I0201 22:24:12.722154 1 resource_quota_monitor.go:224] "QuotaMonitor created object count evaluator" resource="statefulsets.apps" I0201 22:24:12.722195 1 resource_quota_monitor.go:224] "QuotaMonitor created object count evaluator" resource="replicasets.apps" I0201 22:24:12.722910 1 resource_quota_monitor.go:224] "QuotaMonitor created object count evaluator" resource="rolebindings.rbac.authorization.k8s.io" I0201 22:24:12.722950 1 resource_quota_monitor.go:224] "QuotaMonitor created object count evaluator" resource="jobs.batch" W0201 22:24:12.723035 1 shared_informer.go:593] resyncPeriod 17h15m46.280444347s is smaller than resyncCheckPeriod [... same message] 0201 22:24:12.751832 1 stateful_set.go:161] "Starting stateful set controller" I0201 22:24:12.751845 1 shared_informer.go:311] Waiting for caches to sync for stateful set I0201 22:24:12.755064 1 certificate_controller.go:115] "Starting certificate controller" name="csrsigning-kubelet-serving" I0201 22:24:12.755080 1 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kubelet-serving I0201 22:24:12.755099 1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key" I0201 22:24:12.756616 1 certificate_controller.go:115] "Starting certificate controller" name="csrsigning-kubelet-client" I0201 22:24:12.756628 1 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kubelet-client I0201 22:24:12.756663 1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key" I0201 22:24:12.757965 1 certificate_controller.go:115] "Starting certificate controller" name="csrsigning-kube-apiserver-client" I0201 22:24:12.757989 1 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kube-apiserver-client I0201 22:24:12.758003 1 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/etc/kubernetes/pki/ca.crt::/etc/kubernetes/pki/ca.key" I0201 22:24:12.759225 1 controllermanager.go:642] "Started controller" controller="certificatesigningrequest-signing-controller" I0201 22:24:12.759319 1 certificate_controller.go:115] "Starting certificate controller" name="csrsigning-legacy-unknown" I0201 22:24:12.759330 1 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-legacy-unknown [...] I0201 22:24:23.707653 1 shared_informer.go:318] Caches are synced for cidrallocator I0201 22:24:23.709265 1 shared_informer.go:318] Caches are synced for PVC protection I0201 22:24:23.718266 1 shared_informer.go:318] Caches are synced for ephemeral I0201 22:24:23.732303 1 shared_informer.go:318] Caches are synced for endpoint I0201 22:24:23.735974 1 shared_informer.go:318] Caches are synced for PV protection I0201 22:24:23.738394 1 shared_informer.go:318] Caches are synced for endpoint_slice_mirroring I0201 22:24:23.743347 1 shared_informer.go:318] Caches are synced for disruption I0201 22:24:23.749414 1 shared_informer.go:318] Caches are synced for endpoint_slice I0201 22:24:23.806862 1 shared_informer.go:318] Caches are synced for namespace I0201 22:24:23.825031 1 shared_informer.go:318] Caches are synced for resource quota I0201 22:24:23.865760 1 shared_informer.go:318] Caches are synced for service account I0201 22:24:23.878232 1 shared_informer.go:318] Caches are synced for HPA I0201 22:24:23.915455 1 shared_informer.go:318] Caches are synced for resource quota I0201 22:24:24.247272 1 shared_informer.go:318] Caches are synced for garbage collector I0201 22:24:24.247302 1 garbagecollector.go:166] "All resource monitors have synced. Proceeding to collect garbage" [...] I0201 22:28:17.019009 1 serving.go:348] Generated self-signed cert in-memory I0201 22:28:17.596780 1 controllermanager.go:189] "Starting" version="v1.28.6" I0201 22:28:17.596861 1 controllermanager.go:191] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" I0201 22:28:17.598891 1 secure_serving.go:213] Serving securely on 127.0.0.1:10257 I0201 22:28:17.599058 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" I0201 22:28:17.599178 1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/etc/kubernetes/pki/front-proxy-ca.crt" I0201 22:28:17.599205 1 leaderelection.go:250] attempting to acquire leader lease kube-system/kube-controller-manager... I0201 22:28:17.599276 1 dynamic_cafilecontent.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"

Then everythings starts to crash

image

This is a fresh install, not modified, any document or config file, in a ubuntu node using vagrant and virtualbox.

Something is wrong in the definitions files.

Only kubelet is possible to find a service, there are no services status for kube-apiserver and kube-controller

image

If we tries to check the binary for kube-api-server:

vagrant@kubemaster:~$ sudo /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/11/fs/usr/local/bin/kube-apiserver W0201 22:31:14.306767 5712 options.go:293] No CIDR for service cluster IPs specified. Default value which was 10.0.0.0/24 is deprecated and will be removed in future releases. Please specify it using --service-cluster-ip-range on kube-apiserver. I0201 22:31:14.651329 5712 serving.go:342] Generated self-signed cert (/var/run/kubernetes/apiserver.crt, /var/run/kubernetes/apiserver.key) I0201 22:31:14.651377 5712 options.go:220] external host was not specified, using 10.0.2.15 W0201 22:31:14.651387 5712 authentication.go:527] AnonymousAuth is not allowed with the AlwaysAllow authorizer. Resetting AnonymousAuth to false. You should use a different authorizer E0201 22:31:14.652413 5712 run.go:74] "command failed" err="[--etcd-servers must be specified, service-account-issuer is a required flag, --service-account-signing-key-file and --service-account-issuer are required flags]"

The same for etcd binary:

vagrant@kubemaster:~$ sudo /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/25/fs/usr/local/bin/etcd {"level":"warn","ts":"2024-02-01T22:33:36.962898Z","caller":"embed/config.go:676","msg":"Running http and grpc server on single port. This is not recommended for production."} {"level":"info","ts":"2024-02-01T22:33:36.963002Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/25/fs/usr/local/bin/etcd"]} {"level":"warn","ts":"2024-02-01T22:33:36.963031Z","caller":"etcdmain/etcd.go:105","msg":"'data-dir' was empty; using default","data-dir":"default.etcd"} {"level":"warn","ts":"2024-02-01T22:33:36.963255Z","caller":"embed/config.go:676","msg":"Running http and grpc server on single port. This is not recommended for production."} {"level":"info","ts":"2024-02-01T22:33:36.963399Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]} {"level":"info","ts":"2024-02-01T22:33:36.964032Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["http://localhost:2379"]} {"level":"info","ts":"2024-02-01T22:33:36.964304Z","caller":"embed/etcd.go:376","msg":"closing etcd server","name":"default","data-dir":"default.etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]} {"level":"info","ts":"2024-02-01T22:33:36.964409Z","caller":"embed/etcd.go:378","msg":"closed etcd server","name":"default","data-dir":"default.etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]} {"level":"warn","ts":"2024-02-01T22:33:36.964435Z","caller":"etcdmain/etcd.go:146","msg":"failed to start etcd","error":"listen tcp 127.0.0.1:2379: bind: address already in use"} {"level":"fatal","ts":"2024-02-01T22:33:36.964462Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"listen tcp 127.0.0.1:2379: bind: address already in use","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:250"}

Logs from kube-apiServer:

_vagrant@kubemaster:~$ kubectl logs kube-apiserver-kubemaster -n kube-system I0201 22:33:19.667044 1 options.go:220] external host was not specified, using 192.168.56.11 I0201 22:33:19.668036 1 server.go:148] Version: v1.28.6 I0201 22:33:19.668055 1 server.go:150] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" I0201 22:33:20.611715 1 shared_informer.go:311] Waiting for caches to sync for nodeauthorizer I0201 22:33:20.629162 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook. I0201 22:33:20.629522 1 plugins.go:161] Loaded 13 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,ClusterTrustBundleAttest,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota. I0201 22:33:20.629952 1 instance.go:298] Using reconciler: lease I0201 22:33:20.667856 1 handler.go:275] Adding GroupVersion apiextensions.k8s.io v1 to ResourceManager W0201 22:33:20.667896 1 genericapiserver.go:744] Skipping API apiextensions.k8s.io/v1beta1 because it has no resources. I0201 22:33:20.862887 1 handler.go:275] Adding GroupVersion v1 to ResourceManager I0201 22:33:20.863192 1 instance.go:709] API group "internal.apiserver.k8s.io" is not enabled, skipping. I0201 22:33:21.155913 1 instance.go:709] API group "resource.k8s.io" is not enabled, skipping. I0201 22:33:21.182668 1 handler.go:275] Adding GroupVersion authentication.k8s.io v1 to ResourceManager W0201 22:33:21.182725 1 genericapiserver.go:744] Skipping API authentication.k8s.io/v1beta1 because it has no resources. W0201 22:33:21.182734 1 genericapiserver.go:744] Skipping API authentication.k8s.io/v1alpha1 because it has no resources. I0201 22:33:21.183279 1 handler.go:275] Adding GroupVersion authorization.k8s.io v1 to ResourceManager W0201 22:33:21.183319 1 genericapiserver.go:744] Skipping API authorization.k8s.io/v1beta1 because it has no resources. I0201 22:33:21.184046 1 handler.go:275] Adding GroupVersion autoscaling v2 to ResourceManager I0201 22:33:21.184843 1 handler.go:275] Adding GroupVersion autoscaling v1 to ResourceManager W0201 22:33:21.184882 1 genericapiserver.go:744] Skipping API autoscaling/v2beta1 because it has no resources. W0201 22:33:21.184887 1 genericapiserver.go:744] Skipping API autoscaling/v2beta2 because it has no resources. I0201 22:33:21.186135 1 handler.go:275] Adding GroupVersion batch v1 to ResourceManager W0201 22:33:21.186182 1 genericapiserver.go:744] Skipping API batch/v1beta1 because it has no resources.

And logs from scheduler:

_vagrant@kubemaster:~$ kubectl logs kube-scheduler-kubemaster -n kube-system I0201 22:33:22.632553 1 serving.go:348] Generated self-signed cert in-memory I0201 22:33:22.993418 1 server.go:154] "Starting Kubernetes Scheduler" version="v1.28.6" I0201 22:33:22.993440 1 server.go:156] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" I0201 22:33:22.997795 1 secure_serving.go:213] Serving securely on 127.0.0.1:10259 I0201 22:33:22.997880 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" I0201 22:33:22.997914 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0201 22:33:22.997949 1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController I0201 22:33:22.998001 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" I0201 22:33:22.998010 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0201 22:33:22.998030 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" I0201 22:33:22.998034 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0201 22:33:23.098750 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController I0201 22:33:23.098767 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0201 22:33:23.098751 1 sharedinformer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0201 22:33:23.100038 1 leaderelection.go:250] attempting to acquire leader lease kube-system/kube-scheduler... I0201 22:33:40.693893 1 server.go:238] "Requested to terminate, exiting"

System is going in eternal loop on error and crash for resource until loses connection:

_vagrant@kubemaster:~$ kubectl get pods -A -w The connection to the server 192.168.56.11:6443 was refused - did you specify the right host or port?

I've done a restart in the vagrant vm, and all resources are running again (until we get the errors and crashs again, but in this "OK" state, see below that there are no services for kube-apiserver, kube-scheduler and controller-manager)

image

Until the system is running, we can check some process stated below:

image

When all resources dies due the errors/crash on lopping, only the process for controller is running:

image

Just a little more of conecpt from describe pods and his logs: kube-scheduler-kubemaster

_kubectl describe pod -n kube-system kube-scheduler-kubemaster Name: kube-scheduler-kubemaster Namespace: kube-system Priority: 2000001000 Priority Class Name: system-node-critical Node: kubemaster/192.168.56.11 Start Time: Fri, 02 Feb 2024 01:58:32 +0000 Labels: component=kube-scheduler tier=control-plane Annotations: kubernetes.io/config.hash: 0670fe8668c8dd769b1e2391a17b95af kubernetes.io/config.mirror: 0670fe8668c8dd769b1e2391a17b95af kubernetes.io/config.seen: 2024-02-01T04:17:34.586974879Z kubernetes.io/config.source: file Status: Running SeccompProfile: RuntimeDefault IP: 192.168.56.11 IPs: IP: 192.168.56.11 Controlled By: Node/kubemaster Containers: kube-scheduler: Container ID: containerd://c269368833b3e53e1f6cd414c0b4e5ed90c26235b7d6826c2093bea3dc28d0df Image: registry.k8s.io/kube-scheduler:v1.28.6 Image ID: registry.k8s.io/kube-scheduler@sha256:a89db556c34d652d403d909882dbd97336f2e935b1c726b2e2b2c0400186ac39 Port: Host Port: Command: kube-scheduler --authentication-kubeconfig=/etc/kubernetes/scheduler.conf --authorization-kubeconfig=/etc/kubernetes/scheduler.conf --bind-address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=true State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 02 Feb 2024 02:07:22 +0000 Finished: Fri, 02 Feb 2024 02:07:29 +0000 Ready: False Restart Count: 218 Requests: cpu: 100m Liveness: http-get https://127.0.0.1:10259/healthz delay=10s timeout=15s period=10s #success=1 #failure=8 Startup: http-get https://127.0.0.1:10259/healthz delay=10s timeout=15s period=10s #success=1 #failure=24 Environment: Mounts: /etc/kubernetes/scheduler.conf from kubeconfig (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kubeconfig: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/scheduler.conf HostPathType: FileOrCreate QoS Class: Burstable Node-Selectors: Tolerations: :NoExecute op=Exists Events: Type Reason Age From Message


Normal Created 21h kubelet Created container kube-scheduler Normal Started 21h kubelet Started container kube-scheduler Normal Pulled 21h kubelet Container image "registry.k8s.io/kube-scheduler:v1.28.6" already present on machine Normal Created 21h (x10 over 21h) kubelet Created container kube-scheduler Normal Started 21h (x10 over 21h) kubelet Started container kube-scheduler Warning Unhealthy 18h (x13 over 19h) kubelet Startup probe failed: Get "https://127.0.0.1:10259/healthz": net/http: TLS handshake timeout Warning Unhealthy 12h (x6 over 21h) kubelet Liveness probe failed: Get "https://127.0.0.1:10259/healthz": dial tcp 127.0.0.1:10259: connect: connection refused Normal SandboxChanged 12h (x95 over 21h) kubelet Pod sandbox changed, it will be killed and re-created. Warning BackOff 11h (x1704 over 21h) kubelet Back-off restarting failed container kube-scheduler in pod kube-scheduler-kubemaster_kube-system(0670fe8668c8dd769b1e2391a17b95af) Normal Pulled 11h (x118 over 21h) kubelet Container image "registry.k8s.io/kube-scheduler:v1.28.6" already present on machine Normal Killing 11h (x106 over 21h) kubelet Stopping container kube-scheduler Normal Started 6h42m (x3 over 6h45m) kubelet Started container kube-scheduler Normal Created 6h20m (x7 over 6h45m) kubelet Created container kube-scheduler Normal Pulled 6h (x11 over 6h45m) kubelet Container image "registry.k8s.io/kube-scheduler:v1.28.6" already present on machine Normal SandboxChanged 4h (x26 over 6h45m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Killing 4h (x25 over 6h42m) kubelet Stopping container kube-scheduler Warning BackOff 3h47m (x554 over 6h43m) kubelet Back-off restarting failed container kube-scheduler in pod kube-scheduler-kubemaster_kube-system(0670fe8668c8dd769b1e2391a17b95af) Warning Unhealthy 3h37m kubelet Startup probe failed: Get "https://127.0.0.1:10259/healthz": net/http: TLS handshake timeout Warning Unhealthy 3h37m kubelet Startup probe failed: Get "https://127.0.0.1:10259/healthz": read tcp 127.0.0.1:41450->127.0.0.1:10259: read: connection reset by peer Normal Pulled 3h37m (x3 over 3h44m) kubelet Container image "registry.k8s.io/kube-scheduler:v1.28.6" already present on machine Normal Started 3h37m (x3 over 3h44m) kubelet Started container kube-scheduler Normal Created 3h37m (x3 over 3h44m) kubelet Created container kube-scheduler Normal Killing 3h37m (x2 over 3h38m) kubelet Stopping container kube-scheduler Normal SandboxChanged 3h37m (x3 over 3h44m) kubelet Pod sandbox changed, it will be killed and re-created. Warning BackOff 3h33m (x28 over 3h38m) kubelet Back-off restarting failed container kube-scheduler in pod kube-scheduler-kubemaster_kube-system(0670fe8668c8dd769b1e2391a17b95af) Normal Started 3h21m (x3 over 3h25m) kubelet Started container kube-scheduler Normal Created 3h21m (x3 over 3h25m) kubelet Created container kube-scheduler Warning Unhealthy 165m kubelet Startup probe failed: Get "https://127.0.0.1:10259/healthz": net/http: TLS handshake timeout Normal SandboxChanged 104m (x16 over 3h25m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Pulled 100m (x23 over 3h25m) kubelet Container image "registry.k8s.io/kube-scheduler:v1.28.6" already present on machine Warning BackOff 89m (x349 over 3h23m) kubelet Back-off restarting failed container kube-scheduler in pod kube-scheduler-kubemaster_kube-system(0670fe8668c8dd769b1e2391a17b95af) Normal Killing 61m (x28 over 3h23m) kubelet Stopping container kube-scheduler Normal Created 4m34s (x3 over 9m23s) kubelet Created container kube-scheduler Normal Pulled 4m34s (x3 over 9m23s) kubelet Container image "registry.k8s.io/kube-scheduler:v1.28.6" already present on machine Normal Started 4m33s (x3 over 9m23s) kubelet Started container kube-scheduler Normal Killing 4m33s (x2 over 4m56s) kubelet Stopping container kube-scheduler Normal SandboxChanged 4m32s (x3 over 9m23s) kubelet Pod sandbox changed, it will be killed and re-created. Warning BackOff 4m26s (x11 over 5m31s) kubelet Back-off restarting failed container kube-scheduler in pod kube-scheduler-kubemasterkube-system(0670fe8668c8dd769b1e2391a17b95af) Warning Unhealthy 2m2s kubelet Liveness probe failed: Get "https://127.0.0.1:10259/healthz": dial tcp 127.0.0.1:10259: connect: connection refused

Controler Manager

_kubectl describe pod -n kube-system kube-controller-manager-kubemaster Name: kube-controller-manager-kubemaster Namespace: kube-system Priority: 2000001000 Priority Class Name: system-node-critical Node: kubemaster/192.168.56.11 Start Time: Fri, 02 Feb 2024 01:58:32 +0000 Labels: component=kube-controller-manager tier=control-plane Annotations: kubernetes.io/config.hash: 2db9bd12f78f5220150a5d8d383647fc kubernetes.io/config.mirror: 2db9bd12f78f5220150a5d8d383647fc kubernetes.io/config.seen: 2024-02-01T04:17:34.586985919Z kubernetes.io/config.source: file Status: Running SeccompProfile: RuntimeDefault IP: 192.168.56.11 IPs: IP: 192.168.56.11 Controlled By: Node/kubemaster Containers: kube-controller-manager: Container ID: containerd://496b2213622d3e3259cf4aaaaccfedf17a8fbc8d3ae4311b7e8a8d3483d55196 Image: registry.k8s.io/kube-controller-manager:v1.28.6 Image ID: registry.k8s.io/kube-controller-manager@sha256:80bdcd72cfe26028bb2fed75732fc2f511c35fa8d1edc03deae11f3490713c9e Port: Host Port: Command: kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-cidr=10.244.0.0/16 --cluster-name=kubernetes --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokencleaner --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --use-service-account-credentials=true State: Running Started: Fri, 02 Feb 2024 02:07:19 +0000 Last State: Terminated Reason: Error Exit Code: 1 Started: Fri, 02 Feb 2024 02:05:15 +0000 Finished: Fri, 02 Feb 2024 02:05:52 +0000 Ready: True Restart Count: 214 Requests: cpu: 200m Liveness: http-get https://127.0.0.1:10257/healthz delay=10s timeout=15s period=10s #success=1 #failure=8 Startup: http-get https://127.0.0.1:10257/healthz delay=10s timeout=15s period=10s #success=1 #failure=24 Environment: Mounts: /etc/ca-certificates from etc-ca-certificates (ro) /etc/kubernetes/controller-manager.conf from kubeconfig (ro) /etc/kubernetes/pki from k8s-certs (ro) /etc/ssl/certs from ca-certs (ro) /usr/libexec/kubernetes/kubelet-plugins/volume/exec from flexvolume-dir (rw) /usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro) /usr/share/ca-certificates from usr-share-ca-certificates (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: ca-certs: Type: HostPath (bare host directory volume) Path: /etc/ssl/certs HostPathType: DirectoryOrCreate etc-ca-certificates: Type: HostPath (bare host directory volume) Path: /etc/ca-certificates HostPathType: DirectoryOrCreate flexvolume-dir: Type: HostPath (bare host directory volume) Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec HostPathType: DirectoryOrCreate k8s-certs: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/pki HostPathType: DirectoryOrCreate kubeconfig: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/controller-manager.conf HostPathType: FileOrCreate usr-local-share-ca-certificates: Type: HostPath (bare host directory volume) Path: /usr/local/share/ca-certificates HostPathType: DirectoryOrCreate usr-share-ca-certificates: Type: HostPath (bare host directory volume) Path: /usr/share/ca-certificates HostPathType: DirectoryOrCreate QoS Class: Burstable Node-Selectors: Tolerations: :NoExecute op=Exists Events: Type Reason Age From Message


Normal Created 21h kubelet Created container kube-controller-manager Normal Started 21h kubelet Started container kube-controller-manager Normal Pulled 21h kubelet Container image "registry.k8s.io/kube-controller-manager:v1.28.6" already present on machine Warning Unhealthy 21h kubelet Liveness probe failed: Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused Normal Started 21h (x3 over 21h) kubelet Started container kube-controller-manager Normal Created 13h (x95 over 21h) kubelet Created container kube-controller-manager Normal Killing 13h (x85 over 21h) kubelet Stopping container kube-controller-manager Normal Pulled 13h (x99 over 21h) kubelet Container image "registry.k8s.io/kube-controller-manager:v1.28.6" already present on machine Normal SandboxChanged 13h (x88 over 21h) kubelet Pod sandbox changed, it will be killed and re-created. Warning BackOff 11h (x1736 over 21h) kubelet Back-off restarting failed container kube-controller-manager in pod kube-controller-manager-kubemaster_kube-system(2db9bd12f78f5220150a5d8d383647fc) Normal Started 6h38m (x3 over 6h45m) kubelet Started container kube-controller-manager Normal SandboxChanged 6h33m (x5 over 6h45m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Created 6h20m (x7 over 6h45m) kubelet Created container kube-controller-manager Normal Pulled 4h56m (x22 over 6h45m) kubelet Container image "registry.k8s.io/kube-controller-manager:v1.28.6" already present on machine Normal Killing 4h50m (x18 over 6h39m) kubelet Stopping container kube-controller-manager Warning BackOff 3h47m (x452 over 6h43m) kubelet Back-off restarting failed container kube-controller-manager in pod kube-controller-manager-kubemaster_kube-system(2db9bd12f78f5220150a5d8d383647fc) Normal Killing 3h42m (x3 over 3h43m) kubelet Stopping container kube-controller-manager Normal Pulled 3h42m (x3 over 3h44m) kubelet Container image "registry.k8s.io/kube-controller-manager:v1.28.6" already present on machine Normal Created 3h42m (x3 over 3h44m) kubelet Created container kube-controller-manager Normal Started 3h42m (x3 over 3h44m) kubelet Started container kube-controller-manager Normal SandboxChanged 3h42m (x4 over 3h44m) kubelet Pod sandbox changed, it will be killed and re-created. Warning BackOff 3h33m (x52 over 3h43m) kubelet Back-off restarting failed container kube-controller-manager in pod kube-controller-manager-kubemaster_kube-system(2db9bd12f78f5220150a5d8d383647fc) Warning BackOff 125m (x352 over 3h24m) kubelet Back-off restarting failed container kube-controller-manager in pod kube-controller-manager-kubemaster_kube-system(2db9bd12f78f5220150a5d8d383647fc) Normal Killing 80m (x20 over 3h23m) kubelet Stopping container kube-controller-manager Normal Created 80m (x26 over 3h25m) kubelet Created container kube-controller-manager Normal Started 80m (x26 over 3h25m) kubelet Started container kube-controller-manager Normal Pulled 80m (x26 over 3h25m) kubelet Container image "registry.k8s.io/kube-controller-manager:v1.28.6" already present on machine Normal SandboxChanged 80m (x21 over 3h25m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Killing 9m27s kubelet Stopping container kube-controller-manager Normal SandboxChanged 9m26s (x2 over 9m32s) kubelet Pod sandbox changed, it will be killed and re-created. Normal Created 5m9s (x3 over 9m32s) kubelet Created container kube-controller-manager Normal Started 5m9s (x3 over 9m31s) kubelet Started container kube-controller-manager Warning BackOff 3m3s (x13 over 9m26s) kubelet Back-off restarting failed container kube-controller-manager in pod kube-controller-manager-kubemaster_kube-system(2db9bd12f78f5220150a5d8d383647fc) Normal Pulled 2m50s (x4 over 9m32s) kubelet Container image "registry.k8s.io/kube-controller-manager:v1.28.6" already present on machine

And APISERVER

_kubectl describe pod -n kube-system kube-apiserver-kubemaster Name: kube-apiserver-kubemaster Namespace: kube-system Priority: 2000001000 Priority Class Name: system-node-critical Node: kubemaster/192.168.56.11 Start Time: Fri, 02 Feb 2024 01:58:32 +0000 Labels: component=kube-apiserver tier=control-plane Annotations: kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.56.11:6443 kubernetes.io/config.hash: 35f64f3e5140428757af4d2c695db0fa kubernetes.io/config.mirror: 35f64f3e5140428757af4d2c695db0fa kubernetes.io/config.seen: 2024-02-01T04:17:20.302433239Z kubernetes.io/config.source: file Status: Running SeccompProfile: RuntimeDefault IP: 192.168.56.11 IPs: IP: 192.168.56.11 Controlled By: Node/kubemaster Containers: kube-apiserver: Container ID: containerd://a668208122107812d5316847b3b7f27a1be7f9bc7d928d457da5bda433a47c66 Image: registry.k8s.io/kube-apiserver:v1.28.6 Image ID: registry.k8s.io/kube-apiserver@sha256:98a686df810b9f1de8e3b2ae869e79c51a36e7434d33c53f011852618aec0a68 Port: Host Port: Command: kube-apiserver --advertise-address=192.168.56.11 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key State: Running Started: Fri, 02 Feb 2024 02:06:56 +0000 Last State: Terminated Reason: Error Exit Code: 137 Started: Fri, 02 Feb 2024 02:03:14 +0000 Finished: Fri, 02 Feb 2024 02:06:12 +0000 Ready: True Restart Count: 207 Requests: cpu: 250m Liveness: http-get https://192.168.56.11:6443/livez delay=10s timeout=15s period=10s #success=1 #failure=8 Readiness: http-get https://192.168.56.11:6443/readyz delay=0s timeout=15s period=1s #success=1 #failure=3 Startup: http-get https://192.168.56.11:6443/livez delay=10s timeout=15s period=10s #success=1 #failure=24 Environment: Mounts: /etc/ca-certificates from etc-ca-certificates (ro) /etc/kubernetes/pki from k8s-certs (ro) /etc/ssl/certs from ca-certs (ro) /usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro) /usr/share/ca-certificates from usr-share-ca-certificates (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: ca-certs: Type: HostPath (bare host directory volume) Path: /etc/ssl/certs HostPathType: DirectoryOrCreate etc-ca-certificates: Type: HostPath (bare host directory volume) Path: /etc/ca-certificates HostPathType: DirectoryOrCreate k8s-certs: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/pki HostPathType: DirectoryOrCreate usr-local-share-ca-certificates: Type: HostPath (bare host directory volume) Path: /usr/local/share/ca-certificates HostPathType: DirectoryOrCreate usr-share-ca-certificates: Type: HostPath (bare host directory volume) Path: /usr/share/ca-certificates HostPathType: DirectoryOrCreate QoS Class: Burstable Node-Selectors: Tolerations: :NoExecute op=Exists Events: Type Reason Age From Message


Normal Created 21h kubelet Created container kube-apiserver Normal Started 21h kubelet Started container kube-apiserver Normal Pulled 21h kubelet Container image "registry.k8s.io/kube-apiserver:v1.28.6" already present on machine Normal SandboxChanged 21h kubelet Pod sandbox changed, it will be killed and re-created. Normal Pulled 21h kubelet Container image "registry.k8s.io/kube-apiserver:v1.28.6" already present on machine Normal Created 21h kubelet Created container kube-apiserver Normal Started 21h kubelet Started container kube-apiserver Warning Unhealthy 21h (x2 over 21h) kubelet Liveness probe failed: Get "https://192.168.56.11:6443/livez": dial tcp 192.168.56.11:6443: connect: connection refused Warning Unhealthy 18h (x9 over 20h) kubelet Startup probe failed: HTTP probe failed with statuscode: 500 Warning Unhealthy 18h (x11 over 20h) kubelet Startup probe failed: Get "https://192.168.56.11:6443/livez": dial tcp 192.168.56.11:6443: connect: connection refused Warning Unhealthy 15h (x242 over 21h) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500 Warning Unhealthy 12h (x93 over 21h) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500 Normal Killing 11h (x80 over 21h) kubelet Stopping container kube-apiserver Warning Unhealthy 11h (x1101 over 21h) kubelet Readiness probe failed: Get "https://192.168.56.11:6443/readyz": dial tcp 192.168.56.11:6443: connect: connection refused Warning BackOff 11h (x2320 over 21h) kubelet Back-off restarting failed container kube-apiserver in pod kube-apiserver-kubemaster_kube-system(35f64f3e5140428757af4d2c695db0fa) Normal Pulled 6h45m (x2 over 6h45m) kubelet Container image "registry.k8s.io/kube-apiserver:v1.28.6" already present on machine Normal Created 6h45m (x2 over 6h45m) kubelet Created container kube-apiserver Normal Started 6h45m (x2 over 6h45m) kubelet Started container kube-apiserver Warning Unhealthy 6h43m kubelet Liveness probe failed: Get "https://192.168.56.11:6443/livez": dial tcp 192.168.56.11:6443: connect: connection refused Warning Unhealthy 6h1m (x95 over 6h43m) kubelet Readiness probe failed: Get "https://192.168.56.11:6443/readyz": dial tcp 192.168.56.11:6443: connect: connection refused Warning Unhealthy 5h52m (x20 over 6h36m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500 Warning Unhealthy 4h50m (x97 over 6h36m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500 Warning Unhealthy 4h (x9 over 6h45m) kubelet Startup probe failed: Get "https://192.168.56.11:6443/livez": dial tcp 192.168.56.11:6443: connect: connection refused Normal SandboxChanged 4h (x26 over 6h45m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Killing 3h48m (x26 over 6h45m) kubelet Stopping container kube-apiserver Warning BackOff 3h47m (x588 over 6h45m) kubelet Back-off restarting failed container kube-apiserver in pod kube-apiserver-kubemaster_kube-system(35f64f3e5140428757af4d2c695db0fa) Normal SandboxChanged 3h44m kubelet Pod sandbox changed, it will be killed and re-created. Normal Pulled 3h44m kubelet Container image "registry.k8s.io/kube-apiserver:v1.28.6" already present on machine Normal Created 3h44m kubelet Created container kube-apiserver Normal Started 3h44m kubelet Started container kube-apiserver Warning Unhealthy 3h38m (x16 over 3h38m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500 Warning Unhealthy 3h38m (x6 over 3h38m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500 Normal Killing 3h32m (x2 over 3h35m) kubelet Stopping container kube-apiserver Normal SandboxChanged 3h25m kubelet Pod sandbox changed, it will be killed and re-created. Normal Started 3h25m kubelet Started container kube-apiserver Normal Created 3h25m kubelet Created container kube-apiserver Warning Unhealthy 3h23m (x2 over 3h24m) kubelet Liveness probe failed: Get "https://192.168.56.11:6443/livez": dial tcp 192.168.56.11:6443: connect: connection refused Warning Unhealthy 3h23m (x18 over 3h24m) kubelet Readiness probe failed: Get "https://192.168.56.11:6443/readyz": dial tcp 192.168.56.11:6443: connect: connection refused Normal Pulled 155m (x11 over 3h25m) kubelet Container image "registry.k8s.io/kube-apiserver:v1.28.6" already present on machine Warning Unhealthy 155m kubelet Startup probe failed: Get "https://192.168.56.11:6443/livez": net/http: TLS handshake timeout Warning Unhealthy 154m kubelet Startup probe failed: Get "https://192.168.56.11:6443/livez": read tcp 192.168.56.11:44012->192.168.56.11:6443: read: connection reset by peer Warning Unhealthy 133m (x73 over 3h22m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500 Normal Killing 124m (x16 over 3h24m) kubelet Stopping container kube-apiserver Warning BackOff 62m (x467 over 3h23m) kubelet Back-off restarting failed container kube-apiserver in pod kube-apiserver-kubemaster_kube-system(35f64f3e5140428757af4d2c695db0fa) Warning Unhealthy 62m kubelet Startup probe failed: Get "https://192.168.56.11:6443/livez": read tcp 192.168.56.11:57034->192.168.56.11:6443: read: connection reset by peer Warning Unhealthy 9m1s (x2 over 9m11s) kubelet Startup probe failed: Get "https://192.168.56.11:6443/livez": dial tcp 192.168.56.11:6443: connect: connection refused Normal SandboxChanged 8m52s (x2 over 9m27s) kubelet Pod sandbox changed, it will be killed and re-created. Warning BackOff 8m48s (x5 over 8m52s) kubelet Back-off restarting failed container kube-apiserver in pod kube-apiserver-kubemasterkube-system(35f64f3e5140428757af4d2c695db0fa) Normal Created 8m36s (x2 over 9m27s) kubelet Created container kube-apiserver Normal Pulled 8m36s (x2 over 9m27s) kubelet Container image "registry.k8s.io/kube-apiserver:v1.28.6" already present on machine Normal Started 8m35s (x2 over 9m26s) kubelet Started container kube-apiserver Normal Killing 5m47s (x2 over 9m22s) kubelet Stopping container kube-apiserver Warning Unhealthy 5m40s (x7 over 5m46s) kubelet Readiness probe failed: Get "https://192.168.56.11:6443/readyz": dial tcp 192.168.56.11:6443: connect: connection refused Warning Unhealthy 3m38s (x2 over 5m47s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500

If we check those ports into vagrant node: image

_I'm trying to check the possible errors, but it is possible to you guys perform a fresh install and validates ?__