syself / cluster-api-provider-hetzner

Cluster API Provider Hetzner :rocket: The best way to manage Kubernetes clusters on Hetzner, fully declarative, Kubernetes-native and with self-healing capabilities
https://caph.syself.com
Apache License 2.0
628 stars 58 forks source link

cluster startup fails after control plane #1207

Closed rgarcia closed 7 months ago

rgarcia commented 7 months ago

/kind bug

What steps did you take and what happened:

  1. kind create cluster

  2. Created a new project in hetzner "caph"

  3. Generated an SSH key (id_ed25519{,.pub}) and added them to the project

  4. Generated an API token caph-api-token with read and write access

  5. Generated a webservice user password in Hetzner Robot

  6. Set up env:

    export HCLOUD_TOKEN=<hcloud token> \
    export HCLOUD_SSH_KEY="caph-ssh-key" \
    export HETZNER_ROBOT_USER='<robot username>' \
    export HETZNER_ROBOT_PASSWORD='<robot password>' \
    export HETZNER_SSH_PUB_PATH=$(pwd)/id_ed25519.pub \
    export HETZNER_SSH_PRIV_PATH=$(pwd)/id_ed25519 \
    export CLUSTER_NAME="management-cluster" \
    export HCLOUD_REGION="ash" \
    export CONTROL_PLANE_MACHINE_COUNT=1 \
    export WORKER_MACHINE_COUNT=3 \
    export KUBERNETES_VERSION=1.28.4 \
    export HCLOUD_CONTROL_PLANE_MACHINE_TYPE=cpx31 \
    export HCLOUD_WORKER_MACHINE_TYPE=cpx31
  7. Added HCLOUD_TOKEN etc. as secrets per the docs

    kubectl create secret generic hetzner --from-literal=hcloud=$HCLOUD_TOKEN --from-literal=robot-user=$HETZNER_ROBOT_USER --from-literal=robot-password=$HETZNER_ROBOT_PASSWORD
    kubectl create secret generic robot-ssh --from-literal=sshkey-name=cluster --from-file=ssh-privatekey=$HETZNER_SSH_PRIV_PATH --from-file=ssh-publickey=$HETZNER_SSH_PUB_PATH
    
    kubectl patch secret hetzner -p '{"metadata":{"labels":{"clusterctl.cluster.x-k8s.io/move":""}}}'
    kubectl patch secret robot-ssh -p '{"metadata":{"labels":{"clusterctl.cluster.x-k8s.io/move":""}}}'
  8. Generated the cluster yaml:

    clusterctl generate cluster $CLUSTER_NAME > $CLUSTER_NAME.yaml
  9. Applied: kubectl apply -f $CLUSTER_NAME.yaml

What did you expect to happen: Cluster to start up! The control plane comes online but then the worker nodes fail to register themselves:

$ clusterctl describe cluster $CLUSTER_NAME 
NAME                                                                   READY  SEVERITY  REASON                       SINCE  MESSAGE                                                                           
Cluster/management-cluster                                             True                                          3m52s                                                                                     
├─ClusterInfrastructure - HetznerCluster/management-cluster            True                                          3m52s                                                                                     
├─ControlPlane - KubeadmControlPlane/management-cluster-control-plane  True                                          3m52s                                                                                     
│ └─Machine/management-cluster-control-plane-q2thg                     True                                          5m27s                                                                                     
└─Workers                                                                                                                                                                                                      
  └─MachineDeployment/management-cluster-md-0                          False  Warning   WaitingForAvailableMachines  5m45s  Minimum availability requires 3 replicas, current 0 available                      
    └─3 Machines...                                                    True                                          3m36s  See management-cluster-md-0-x6vdg-9gzph, management-cluster-md-0-x6vdg-dvf2b, ... 

caph-controller-manager logs:

12:07:09 INFO  "skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called" builder/webhook.go:173 {'GVK': 'infrastructure.cluster.x-k8s.io/v1beta1, Kind=HCloudMachineTemplate'}
12:07:09 INFO  "skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called" builder/webhook.go:173 {'GVK': 'infrastructure.cluster.x-k8s.io/v1beta1, Kind=HetznerBareMetalMachineTemplate'}
12:07:09 INFO  "starting manager" ./main.go:233 {'version': ''}
12:07:09 INFO  "Starting metrics server" server/server.go:185 
12:07:09 INFO  "starting server" manager/server.go:50 {'kind': 'health probe', 'addr': '[::]:9440'}
12:07:09 INFO  "Starting webhook server" webhook/server.go:191 
12:07:09 INFO  "Serving metrics server" server/server.go:224 {'bindAddress': 'localhost:8080', 'secure': False}
12:07:09 INFO  "Serving webhook server" webhook/server.go:242 {'host': '', 'port': 9443}
I0313 12:07:09.290868       1 leaderelection.go:250] attempting to acquire leader lease caph-system/hetzner.cluster.x-k8s.io...
I0313 12:07:09.293269       1 leaderelection.go:260] successfully acquired lease caph-system/hetzner.cluster.x-k8s.io
12:11:43 INFO  "HCloudMachineTemplate is missing ownerRef to cluster or cluster does not exist default/management-cluster-control-plane" controllers/hcloudmachinetemplate_controller.go:92 {'HCloudMachineTemplate': {'name': 'management-cluster-control-plane', 'namespace': 'default'}}
12:11:43 INFO  "HCloudMachineTemplate is missing ownerRef to cluster or cluster does not exist default/management-cluster-md-0" controllers/hcloudmachinetemplate_controller.go:92 {'HCloudMachineTemplate': {'name': 'management-cluster-md-0', 'namespace': 'default'}}
12:11:43 INFO  "HCloudMachineTemplate is missing ownerRef to cluster or cluster does not exist default/management-cluster-control-plane" controllers/hcloudmachinetemplate_controller.go:92 {'HCloudMachineTemplate': {'name': 'management-cluster-control-plane', 'namespace': 'default'}}
12:11:43 INFO  "HCloudMachineTemplate is missing ownerRef to cluster or cluster does not exist default/management-cluster-md-0" controllers/hcloudmachinetemplate_controller.go:92 {'HCloudMachineTemplate': {'name': 'management-cluster-md-0', 'namespace': 'default'}}
12:11:43 INFO  "HCloudMachineTemplate is missing ownerRef to cluster or cluster does not exist default/management-cluster-control-plane" controllers/hcloudmachinetemplate_controller.go:92 {'HCloudMachineTemplate': {'name': 'management-cluster-control-plane', 'namespace': 'default'}}
12:11:43 INFO  "HCloudMachineTemplate is missing ownerRef to cluster or cluster does not exist default/management-cluster-md-0" controllers/hcloudmachinetemplate_controller.go:92 {'HCloudMachineTemplate': {'name': 'management-cluster-md-0', 'namespace': 'default'}}
12:11:43 INFO  "Cluster Controller has not yet set OwnerRef" controllers/hetznercluster_controller.go:111 
12:11:43 INFO  "HCloudMachineTemplate is missing ownerRef to cluster or cluster does not exist default/management-cluster-control-plane" controllers/hcloudmachinetemplate_controller.go:92 {'HCloudMachineTemplate': {'name': 'management-cluster-control-plane', 'namespace': 'default'}}
12:11:43 INFO  "HCloudMachineTemplate is missing ownerRef to cluster or cluster does not exist default/management-cluster-md-0" controllers/hcloudmachinetemplate_controller.go:92 {'HCloudMachineTemplate': {'name': 'management-cluster-md-0', 'namespace': 'default'}}
12:11:43 INFO  "metadata.finalizers: "hetznercluster.infrastructure.cluster.x-k8s.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers" log/warning_handler.go:65 
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-dt96n', 'namespace': 'default'}}
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-dt96n', 'namespace': 'default'}}
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-zjnj7', 'namespace': 'default'}}
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-zjnj7', 'namespace': 'default'}}
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-mt5zf', 'namespace': 'default'}}
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-mt5zf', 'namespace': 'default'}}
12:11:43 INFO  "metadata.finalizers: "hcloudmachine.infrastructure.cluster.x-k8s.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers" log/warning_handler.go:65 
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-dt96n', 'namespace': 'default'}}
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-zjnj7', 'namespace': 'default'}}
12:11:43 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-md-0-zjnj7', 'namespace': 'default'}}
12:11:46 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-control-plane-xw8jb', 'namespace': 'default'}}
12:11:46 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-control-plane-xw8jb', 'namespace': 'default'}}
12:11:46 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-control-plane-xw8jb', 'namespace': 'default'}}
12:11:46 INFO  "Machine Controller has not yet set OwnerRef" controllers/hcloudmachine_controller.go:92 {'HCloudMachine': {'name': 'management-cluster-control-plane-xw8jb', 'namespace': 'default'}}

I ssh'd onto one of the worker nodes and here are the kubelet logs:

Mar 13 12:15:03 management-cluster-md-0-dt96n systemd[1]: Started kubelet: The Kubernetes Node Agent.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --authentication-token-webhook has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --authorization-mode has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --event-qps has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --max-pods has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --read-only-port has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --register-with-taints has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --rotate-server-certificates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.806196    2928 server.go:203] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.809982    2928 server.go:467] "Kubelet version" kubeletVersion="v1.28.4"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.810004    2928 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.810262    2928 server.go:895] "Client rotation is on, will bootstrap in background"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.812097    2928 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.818487    2928 server.go:725] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.818766    2928 container_manager_linux.go:265] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.819015    2928 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={"RuntimeCgroupsName":"","SystemCgroupsName":"","KubeletCgroupsName":"","KubeletOOMScoreAdj":-999,"ContainerRuntime":"","CgroupsPerQOS":true,"CgroupRoot":"/","CgroupDriver":"systemd","KubeletRootDir":"/var/lib/kubelet","ProtectKernelDefaults":false,"KubeReservedCgroupName":"","SystemReservedCgroupName":"","ReservedSystemCPUs":{},"EnforceNodeAllocatable":{"pods":{}},"KubeReserved":null,"SystemReserved":null,"HardEvictionThresholds":[{"Signal":"nodefs.available","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.1},"GracePeriod":0,"MinReclaim":null},{"Signal":"nodefs.inodesFree","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05},"GracePeriod":0,"MinReclaim":null},{"Signal":"imagefs.available","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.15},"GracePeriod":0,"MinReclaim":null},{"Signal":"memory.available","Operator":"LessThan","Value":{"Quantity":"100Mi","Percentage":0},"GracePeriod":0,"MinReclaim":null}],"QOSReserved":{},"CPUManagerPolicy":"none","CPUManagerPolicyOptions":null,"TopologyManagerScope":"container","CPUManagerReconcilePeriod":10000000000,"ExperimentalMemoryManagerPolicy":"None","ExperimentalMemoryManagerReservedMemory":null,"PodPidsLimit":-1,"EnforceCPULimits":true,"CPUCFSQuotaPeriod":100000000,"TopologyManagerPolicy":"none","TopologyManagerPolicyOptions":null}
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.819041    2928 topology_manager.go:138] "Creating topology manager with none policy"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.819052    2928 container_manager_linux.go:301] "Creating device plugin manager"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.819189    2928 state_mem.go:36] "Initialized new in-memory state store"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.819350    2928 kubelet.go:393] "Attempting to sync node with API server"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.819375    2928 kubelet.go:298] "Adding static pod path" path="/etc/kubernetes/manifests"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.819421    2928 kubelet.go:309] "Adding apiserver pod source"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.819440    2928 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.820145    2928 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.7.13" apiVersion="v1"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: W0313 12:15:03.820511    2928 probe.go:268] Flexvolume plugin directory at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ does not exist. Recreating.
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.820983    2928 server.go:1232] "Started kubelet"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.821235    2928 ratelimit.go:65] "Setting rate limiting for podresources endpoint" qps=100 burstTokens=10
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.821292    2928 server.go:162] "Starting to listen" address="0.0.0.0" port=10250
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.821403    2928 server.go:233] "Starting to serve the podresources API" endpoint="unix:/var/lib/kubelet/pod-resources/kubelet.sock"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.821775    2928 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.821801    2928 kubelet.go:1431] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.822055    2928 server.go:462] "Adding debug handlers to kubelet server"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.822178    2928 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.824209    2928 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"management-cluster-md-0-dt96n\" not found"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.824284    2928 volume_manager.go:291] "Starting Kubelet Volume Manager"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.824605    2928 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.824788    2928 reconciler_new.go:29] "Reconciler: start to sync state"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: W0313 12:15:03.830533    2928 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: nodes "management-cluster-md-0-dt96n" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.830734    2928 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: nodes "management-cluster-md-0-dt96n" is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: W0313 12:15:03.830909    2928 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.831027    2928 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.831184    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc9831f46d", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 820964973, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 820964973, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: W0313 12:15:03.835599    2928 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.835777    2928 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.835973    2928 controller.go:146] "Failed to ensure lease exists, will retry" err="leases.coordination.k8s.io \"management-cluster-md-0-dt96n\" is forbidden: User \"system:anonymous\" cannot get resource \"leases\" in API group \"coordination.k8s.io\" in the namespace \"kube-node-lease\"" interval="200ms"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.836143    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc983e91eb", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 821791723, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 821791723, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.846595    2928 cpu_manager.go:214] "Starting CPU manager" policy="none"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.846613    2928 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.846624    2928 state_mem.go:36] "Initialized new in-memory state store"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.847869    2928 policy_none.go:49] "None policy: Start"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.848470    2928 memory_manager.go:169] "Starting memorymanager" policy="None"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.848493    2928 state_mem.go:35] "Initializing new in-memory state store"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.851139    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a98a21", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845579297, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845579297, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.852366    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a99f61", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845584737, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845584737, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.854826    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a9b072", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845589106, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845589106, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.883207    2928 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.883413    2928 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.884102    2928 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"management-cluster-md-0-dt96n\" not found"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.886463    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc9bf6d950", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeAllocatableEnforced", Message:"Updated Node Allocatable limit across pods", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 884200272, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 884200272, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events is forbidden: User "system:anonymous" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.911791    2928 kubelet_network_linux.go:50] "Initialized iptables rules." protocol="IPv4"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.912522    2928 kubelet_network_linux.go:50] "Initialized iptables rules." protocol="IPv6"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.912561    2928 status_manager.go:217] "Starting to sync pod status with apiserver"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.912587    2928 kubelet.go:2303] "Starting kubelet main sync loop"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.912633    2928 kubelet.go:2327] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: W0313 12:15:03.914465    2928 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.914488    2928 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:03.925978    2928 kubelet_node_status.go:70] "Attempting to register node" node="management-cluster-md-0-dt96n"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.927926    2928 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="management-cluster-md-0-dt96n"
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.927916    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a98a21", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845579297, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 925943932, time.Local), Count:2, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a98a21" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.929275    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a99f61", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845584737, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 925951607, time.Local), Count:2, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a99f61" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:03 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:03.930799    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a9b072", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845589106, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 925954562, time.Local), Count:2, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a9b072" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.038319    2928 controller.go:146] "Failed to ensure lease exists, will retry" err="leases.coordination.k8s.io \"management-cluster-md-0-dt96n\" is forbidden: User \"system:anonymous\" cannot get resource \"leases\" in API group \"coordination.k8s.io\" in the namespace \"kube-node-lease\"" interval="400ms"
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:04.129701    2928 kubelet_node_status.go:70] "Attempting to register node" node="management-cluster-md-0-dt96n"
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.131506    2928 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="management-cluster-md-0-dt96n"
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.131504    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a98a21", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845579297, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 4, 129651437, time.Local), Count:3, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a98a21" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.133007    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a99f61", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845584737, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 4, 129668940, time.Local), Count:3, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a99f61" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.134313    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a9b072", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845589106, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 4, 129675343, time.Local), Count:3, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a9b072" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.440512    2928 controller.go:146] "Failed to ensure lease exists, will retry" err="leases.coordination.k8s.io \"management-cluster-md-0-dt96n\" is forbidden: User \"system:anonymous\" cannot get resource \"leases\" in API group \"coordination.k8s.io\" in the namespace \"kube-node-lease\"" interval="800ms"
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:04.533399    2928 kubelet_node_status.go:70] "Attempting to register node" node="management-cluster-md-0-dt96n"
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.535136    2928 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes is forbidden: User \"system:anonymous\" cannot create resource \"nodes\" in API group \"\" at the cluster scope" node="management-cluster-md-0-dt96n"
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.535432    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a98a21", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845579297, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 4, 533341071, time.Local), Count:4, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a98a21" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.536538    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a99f61", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845584737, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 4, 533366008, time.Local), Count:4, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a99f61" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:04.538255    2928 event.go:280] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"management-cluster-md-0-dt96n.17bc51fc99a9b072", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"management-cluster-md-0-dt96n", UID:"management-cluster-md-0-dt96n", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node management-cluster-md-0-dt96n status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"management-cluster-md-0-dt96n"}, FirstTimestamp:time.Date(2024, time.March, 13, 12, 15, 3, 845589106, time.Local), LastTimestamp:time.Date(2024, time.March, 13, 12, 15, 4, 533368664, time.Local), Count:4, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"kubelet", ReportingInstance:"management-cluster-md-0-dt96n"}': 'events "management-cluster-md-0-dt96n.17bc51fc99a9b072" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "" in the namespace "default"' (will not retry!)
Mar 13 12:15:04 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:04.812502    2928 transport.go:147] "Certificate rotation detected, shutting down client connections to start using new credentials"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:05.195491    2928 csi_plugin.go:295] Failed to initialize CSINode: error updating CSINode annotation: timed out waiting for the condition; caused by: nodes "management-cluster-md-0-dt96n" not found
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: E0313 12:15:05.248754    2928 nodelease.go:49] "Failed to get node when trying to set owner ref to the node lease" err="nodes \"management-cluster-md-0-dt96n\" not found" node="management-cluster-md-0-dt96n"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.336940    2928 kubelet_node_status.go:70] "Attempting to register node" node="management-cluster-md-0-dt96n"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.344291    2928 kubelet_node_status.go:73] "Successfully registered node" node="management-cluster-md-0-dt96n"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.358496    2928 kuberuntime_manager.go:1528] "Updating runtime config through cri with podcidr" CIDR="10.244.2.0/24"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.359058    2928 kubelet_network.go:61] "Updating Pod CIDR" originalPodCIDR="" newPodCIDR="10.244.2.0/24"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.665107    2928 kubelet_node_status.go:493] "Fast updating node status as it just became ready"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.821055    2928 apiserver.go:52] "Watching apiserver"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.824327    2928 topology_manager.go:215] "Topology Admit Handler" podUID="c044f7b6-050d-426e-a3c6-0144317f67d9" podNamespace="kube-system" podName="kube-proxy-j6b4k"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.825791    2928 desired_state_of_world_populator.go:159] "Finished populating initial desired state of world"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.834158    2928 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-proxy\" (UniqueName: \"kubernetes.io/configmap/c044f7b6-050d-426e-a3c6-0144317f67d9-kube-proxy\") pod \"kube-proxy-j6b4k\" (UID: \"c044f7b6-050d-426e-a3c6-0144317f67d9\") " pod="kube-system/kube-proxy-j6b4k"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.834223    2928 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/c044f7b6-050d-426e-a3c6-0144317f67d9-xtables-lock\") pod \"kube-proxy-j6b4k\" (UID: \"c044f7b6-050d-426e-a3c6-0144317f67d9\") " pod="kube-system/kube-proxy-j6b4k"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.834279    2928 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/c044f7b6-050d-426e-a3c6-0144317f67d9-lib-modules\") pod \"kube-proxy-j6b4k\" (UID: \"c044f7b6-050d-426e-a3c6-0144317f67d9\") " pod="kube-system/kube-proxy-j6b4k"
Mar 13 12:15:05 management-cluster-md-0-dt96n kubelet[2928]: I0313 12:15:05.834318    2928 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-72n42\" (UniqueName: \"kubernetes.io/projected/c044f7b6-050d-426e-a3c6-0144317f67d9-kube-api-access-72n42\") pod \"kube-proxy-j6b4k\" (UID: \"c044f7b6-050d-426e-a3c6-0144317f67d9\") " pod="kube-system/kube-proxy-j6b4k"

Anything else you would like to add: Thank you for any assistance you can provide!

Environment:

batistein commented 7 months ago

can you please show the output of kubectl get pods -A of the workload cluster?

rgarcia commented 7 months ago

Hi @batistein here is the output, really appreciate your help!

NAMESPACE     NAME                                                             READY   STATUS    RESTARTS   AGE
kube-system   coredns-5dd5756b68-hr545                                         0/1     Pending   0          5m27s
kube-system   coredns-5dd5756b68-m8cgs                                         0/1     Pending   0          5m27s
kube-system   etcd-management-cluster-control-plane-gxw2s                      1/1     Running   0          5m27s
kube-system   kube-apiserver-management-cluster-control-plane-gxw2s            1/1     Running   0          5m27s
kube-system   kube-controller-manager-management-cluster-control-plane-gxw2s   1/1     Running   0          5m27s
kube-system   kube-proxy-62nkw                                                 1/1     Running   0          4m13s
kube-system   kube-proxy-74chj                                                 1/1     Running   0          5m27s
kube-system   kube-proxy-gs79r                                                 1/1     Running   0          4m14s
kube-system   kube-proxy-l474v                                                 1/1     Running   0          4m14s
kube-system   kube-scheduler-management-cluster-control-plane-gxw2s            1/1     Running   0          5m27s
rgarcia commented 7 months ago

And describe pod on one of the coredns pods that isn't coming up:

kubectl describe pod coredns-5dd5756b68-hr545 --namespace kube-system 
Name:                 coredns-5dd5756b68-hr545
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 <none>
Labels:               k8s-app=kube-dns
                      pod-template-hash=5dd5756b68
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-5dd5756b68
Containers:
  coredns:
    Image:       registry.k8s.io/coredns/coredns:v1.10.1
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nt6sv (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-nt6sv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 45s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 45s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  34m                  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  8m37s (x5 over 29m)  default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 3 node(s) had untolerated taint {node.cluster.x-k8s.io/uninitialized: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  6m59s                default-scheduler  0/7 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 6 node(s) had untolerated taint {node.cluster.x-k8s.io/uninitialized: }. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  119s                 default-scheduler  0/9 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 8 node(s) had untolerated taint {node.cluster.x-k8s.io/uninitialized: }. preemption: 0/9 nodes are available: 9 Preemption is not helpful for scheduling..
guettli commented 7 months ago

@rgarcia Have you installed a CNI (for example cilium) and the ccm?

Can you please post the output of this command twice (from mgt-cluster and from wl-cluster)

go run github.com/guettli/check-conditions@latest all  
batistein commented 7 months ago

@rgarcia from the list of your pods running in the workload cluster. It's clear that ccm and cni are missing. Coredns cannot run without a cni.

rgarcia commented 7 months ago

🤦 should have kept reading through the docs. Sorry for the noise. After installing cilium and ccm everything looks good. Thank you!