kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.46k stars 1.56k forks source link

kind fails to (re) create a cluster in devcontainers and docker-in-docker feaure #3695

Open vesnikos opened 3 months ago

vesnikos commented 3 months ago

What happened:

The following is happening in a devcontainer environment configure to contain docker-in-docker and kind capabilities.

  1. kind create cluster succeeds ✔️
  2. kind delete cluster succeeds ✔️
  3. kind create cluster fails ❌

What you expected to happen: I would expect kind to succeed in the 2nd time it tries to create a cluster

How to reproduce it (as minimally and precisely as possible):

  1. As described above, the issue happens inside in devcontainers - so you'd need a way to interface with that technology. vscode is a good choice if you install the devcontainer extension. Otherways exist and should yield the same result.

  2. create a devcontainer based on the following spec:

project_folder/.devcontainer/devcontainer.json

{
"name": "kind-test",
"image": "mcr.microsoft.com/devcontainers/base:bullseye",
"remoteUser": "root",
"features": {
"ghcr.io/devcontainers/features/docker-in-docker:2":{},
"ghcr.io/mpriscella/features/kind:1": {}
}
}
  1. build the devcontainer
  2. open the devcontainer's terminal
  3. run $> kind create cluster
  4. run $> kind delete cluster
  5. run $> kind create cluster
  6. program exits after a timeout

Anything else we need to know?:

  1. the full error message is
Creating cluster "kind" ...
 • Ensuring node image (kindest/node:v1.30.0) 🖼  ...
 ✓ Ensuring node image (kindest/node:v1.30.0) 🖼
 • Preparing nodes 📦   ...
 ✓ Preparing nodes 📦 
 • Writing configuration 📜  ...
 ✓ Writing configuration 📜
 • Starting control-plane 🕹️  ...
 ✗ Starting control-plane 🕹️
Deleted nodes: ["kind-control-plane"]
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

Command Output: I0726 12:07:42.622308     219 initconfiguration.go:260] loading configuration from "/kind/kubeadm.conf"
W0726 12:07:42.623343     219 initconfiguration.go:348] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
[init] Using Kubernetes version: v1.30.0
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0726 12:07:42.627754     219 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0726 12:07:42.700629     219 certs.go:483] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kind-control-plane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost] and IPs [10.96.0.1 172.19.0.2 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0726 12:07:42.951820     219 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0726 12:07:43.071972     219 certs.go:483] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0726 12:07:43.132726     219 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0726 12:07:43.247101     219 certs.go:483] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.19.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.19.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0726 12:07:43.772518     219 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0726 12:07:43.982415     219 kubeconfig.go:112] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0726 12:07:44.061946     219 kubeconfig.go:112] creating kubeconfig file for super-admin.conf
[kubeconfig] Writing "super-admin.conf" kubeconfig file
I0726 12:07:44.383594     219 kubeconfig.go:112] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0726 12:07:44.522444     219 kubeconfig.go:112] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0726 12:07:44.608115     219 kubeconfig.go:112] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0726 12:07:44.658276     219 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0726 12:07:44.658355     219 manifests.go:103] [control-plane] getting StaticPodSpecs
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0726 12:07:44.658552     219 certs.go:483] validating certificate period for CA certificate
I0726 12:07:44.658886     219 manifests.go:129] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0726 12:07:44.658908     219 manifests.go:129] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0726 12:07:44.658912     219 manifests.go:129] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0726 12:07:44.658914     219 manifests.go:129] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0726 12:07:44.658916     219 manifests.go:129] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0726 12:07:44.659752     219 manifests.go:158] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I0726 12:07:44.659783     219 manifests.go:103] [control-plane] getting StaticPodSpecs
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0726 12:07:44.659899     219 manifests.go:129] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0726 12:07:44.659921     219 manifests.go:129] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0726 12:07:44.659925     219 manifests.go:129] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0726 12:07:44.659926     219 manifests.go:129] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0726 12:07:44.659928     219 manifests.go:129] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0726 12:07:44.659930     219 manifests.go:129] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0726 12:07:44.659932     219 manifests.go:129] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0726 12:07:44.660500     219 manifests.go:158] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I0726 12:07:44.660527     219 manifests.go:103] [control-plane] getting StaticPodSpecs
I0726 12:07:44.660647     219 manifests.go:129] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0726 12:07:44.661074     219 manifests.go:158] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
I0726 12:07:44.661103     219 kubelet.go:68] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
I0726 12:07:44.798024     219 loader.go:395] Config loaded from file:  /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase.func1
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:110
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:115
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:128
github.com/spf13/cobra.(*Command).execute
    github.com/spf13/cobra@v1.7.0/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
    github.com/spf13/cobra@v1.7.0/command.go:1068
github.com/spf13/cobra.(*Command).Execute
    github.com/spf13/cobra@v1.7.0/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
    k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:52
main.main
    k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    runtime/proc.go:271
runtime.goexit
    runtime/asm_amd64.s:1695
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:128
github.com/spf13/cobra.(*Command).execute
    github.com/spf13/cobra@v1.7.0/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
    github.com/spf13/cobra@v1.7.0/command.go:1068
github.com/spf13/cobra.(*Command).Execute
    github.com/spf13/cobra@v1.7.0/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
    k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:52
main.main
    k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    runtime/proc.go:271
runtime.goexit
    runtime/asm_amd64.s:1695
[kubelet-check] The kubelet is not healthy after 4m0.000558619s

Unfortunately, an error has occurred:
    The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' returned error: Get "http://localhost:10248/healthz": context deadline exceeded

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
    - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
  1. The first the cluster is being created, the 2nd time it fails. For it to remake i think I have to :

    • delete the cluster with kind and
    • re-build the devcontainer
  2. logs: logs.tar.gz

  3. cmd log : cmd.log

  4. related but not similar: #3196 Environment:

vesnikos commented 3 months ago

might be similar to #3558 and #3340

BenTheElder commented 3 months ago

I recommend not nesting it inside another dind, mounting the socket instead.

https://github.com/kubernetes-sigs/kind/issues/3196#issuecomment-1540148722