kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.35k stars 1.55k forks source link

KIND kubelet fails to start #3664

Open tyler-dunkel opened 3 months ago

tyler-dunkel commented 3 months ago

What happened: command kind create cluster fails at the Starting control-plane step. Prints out an error about the kubelet not being healthy. The error logs show an issue about the clock speed verification: failed to run Kubelet: could not detect clock speed from output:

What you expected to happen: Kind cluster to be created successfully

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: error logs: 537174967.zip Environment:

Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 1 Server Version: 26.0.0 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: inactive Runtimes: runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: unconfined cgroupns Kernel Version: 6.6.22-linuxkit Operating System: Docker Desktop OSType: linux Architecture: aarch64 CPUs: 14 Total Memory: 17.3GiB Name: docker-desktop ID: 0b6ccbac-1fdf-48fc-a325-51a8d38af8e7 Docker Root Dir: /var/lib/docker Debug Mode: false HTTP Proxy: http.docker.internal:3128 HTTPS Proxy: http.docker.internal:3128 No Proxy: hubproxy.docker.internal Labels: com.docker.desktop.address=unix:///Users/tylerdunkel/Library/Containers/com.docker.docker/Data/docker-cli.sock Experimental: false Insecure Registries: hubproxy.docker.internal:5555 127.0.0.0/8 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile


- OS (e.g. from `/etc/os-release`): OSX
- Kubernetes version: (use `kubectl version`): Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
The connection to the server localhost:8080 was refused - did you specify the right host or port?
- Any proxies or other special environment settings?:
stmcginnis commented 3 months ago

/remove-kind bug /kind support

Can you check if DOCKER_DEFAULT_PLATFORM is set? I've seen some mention of this error happening when trying to run amd64 on arm64 hosts.

tyler-dunkel commented 3 months ago

I dont see it set. Should I set it export DOCKER_DEFAULT_PLATFORM=linux/amd64 ?

BenTheElder commented 3 months ago

I dont see it set. Should I set it export DOCKER_DEFAULT_PLATFORM=linux/amd64 ?

unset is preferable, so we can just use the docker default (which matches the host), but if it is set it should match the host platform, which in your case is not amd64

see #2718

tyler-dunkel commented 3 months ago

yea it is unset. It should be using the docker default. NOTE: I tested other docker containers (like hello-world) and they worked fine.

BenTheElder commented 3 months ago

Based on the logs it looks like https://github.com/kubernetes-sigs/kind/issues/2718, this appears to be an amd64 image on an arm64 host. Maybe the image was already pulled as amd64 previously?

tyler-dunkel commented 3 months ago

Ive cleared all the images and containers out of docker and tried to run on a "clean" docker but still hit same issue.

tyler-dunkel commented 3 months ago

one thing to note compaired to #2718 is my install passes the prepare nodes step and fails on start control plane

tyler-dunkel commented 3 months ago

Adding my cluster yaml for additional details

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
      - containerPort: 443
        hostPort: 443
        protocol: TCP
BenTheElder commented 3 months ago

From images.log we can see amd64 images preloaded into this node image, indicating it's an amd64 image.

This still looks like a variation on https://github.com/kubernetes-sigs/kind/issues/2718

Can you try explicitly using the arm64 image?

$ crane manifest kindest/node:v1.30.0@sha256:047357ac0cfea04663786a612ba1eaba9702bef25227a794b52890dd8bcd692e
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 743,
         "digest": "sha256:2af5d1b382926abcd6336312d652cd045b7cc47475844a608669c71b1fefcfbc",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 743,
         "digest": "sha256:5e4ce6f9033bdb9ce81a7fd699c8e67cfcacfab57076058e3e6f33c32036b42b",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}

You can use: --image=kindest/node:v1.30.0@sha256:5e4ce6f9033bdb9ce81a7fd699c8e67cfcacfab57076058e3e6f33c32036b42b

BenTheElder commented 3 months ago

Better yet also with DOCKER_DEFAULT_PLATFORM=linux/arm64

It still looks like something is causing the wrong architecture to be run, and kind doesn't do anything this internally, that would be docker.

tyler-dunkel commented 3 months ago

Ok it created successfully with those changes. But I wonder why it was picking up the wrong arch?

BenTheElder commented 3 months ago

That env may be set or there may be some other related docker setting set.

Docker sort of supports cross-platform image running but the way it does it is not sufficient to run something like Kubernetes.

If you figure out what setting this is overriding please let us know, I'm not aware of another option like DOCKER_DEFAULT_PLATFORM unless you supply --platform to docker run, which kind is not doing, so I'm not sure what it would be besides the env, but it seems plausible that docker would have some other way to set it.