kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.35k stars 1.55k forks source link

Cant create a kind cluster after delete cluster in a docker in docker vscode devcontainer #3370

Open KieranJeffreySmart opened 1 year ago

KieranJeffreySmart commented 1 year ago

What happened: I am trying to create a kind cluster in a vscode devcontainer. I am working on windows with docker desktop and have been using a docker inside docker template.

When the container is first constructed I am able to create a cluster using kind create cluster from a terminal within the container and this works successful

However if i delete the cluster and try to create again it fails.

This doesn't happen when I repeat the process on the host windows machine, it will create every time.

This is to be used in a script so I need it to be repeatable, delete cluster then create cluster

kind-control-plane.zip

Thanks in advance for any assistance

What you expected to happen: A new cluster is created

How to reproduce it (as minimally and precisely as possible):

  1. Create a new devcontainer in VSCode from New Dev Container... menu option
  2. Create from Docker in Docker template
  3. Add features for kind, kubectl and node
  4. create devcontainer
  5. open a terminal
  6. enter command kind create cluster
  7. enter command kind delete cluster
  8. enter command kind create cluster

Anything else we need to know?:

Environment: Windows 11 Docker Desktop 4.23.0 Dev Container Features

"ghcr.io/devcontainers/features/node:1": {}, "ghcr.io/mpriscella/features/kind:1": {}, "ghcr.io/devcontainers-contrib/features/kubectl-asdf:2": {} Docker Info from inside Dev Container:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /home/vscode/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.21.0-1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 23.0.6+azure-2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
 runc version: ccaecfcbc907d70a7aa870a6650887b901b25b82
 init version: 
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.10.102.1-microsoft-standard-WSL2
 Operating System: Debian GNU/Linux 11 (bullseye) (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 15.5GiB
 Name: 76da64a73ada
 ID: ee13f67f-b2b0-4995-8883-dd3c59c7f619
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
BenTheElder commented 1 year ago

I am trying to create a kind cluster in a vscode devcontainer. I am working on windows with docker desktop and have been using a docker inside docker template.

We don't recommend this and it may be a bug in the docker in docker environment.

Please avoid adding additional nesting, it's a real headache to debug.

mboutet commented 12 months ago

@BenTheElder, I see that you replied this to a lot of similar issues lately, but I just want to say that using kind within an already containerized environment is a totally acceptable use case. Two important use cases:

I understand that this adds complexity on your end and makes debugging more difficult, but I just want to make sure you're aware of the valid use cases of running kind in containerized environments. Those use cases won't go away.

@KieranJeffreySmart, see https://github.com/kubernetes-sigs/kind/issues/3283#issuecomment-1745616607. TL;DR, you likely need to enable cgroup v2 on the VM on which Docker runs for kind v20+ to work properly.

BenTheElder commented 11 months ago

I'm aware of the use cases, but we have limited bandwidth to provide supprt and it's available as a static go binary, you can't containerize docker itself either.

We'll happily review proposed fixes from contributors but I just cannot justify spending my own time debugging this versus steering people towards more debuggable alternatives.

Kind is already running containers in containers which is unfortunately insecure and error prone but similarly useful, I highly recommend avoid doing this again with another layer.

See also #303 for additional footguns running nested inside of another Kubernetes cluster.

BenTheElder commented 11 months ago

For Windows specifically: #1529, nobody has contributed to work on CI for windows. aojea and I don't use windows for development, so we depend on community contributions to keep the WSL2 docs up to date and identify fixes for us to review or sometimes implement without being able to directly verify ourselves.

... let alone adding container nesting on Windows.

dboreham commented 8 months ago

... let alone adding container nesting on Windows.

Quick note for the audience with no Windows exposure: containers/docker on Windows (except for actual Windows containers which nobody uses) runs in a Linux kernel and for the most part behaves the same as if it were running on a bare metal Linux box. Although it's convenient, you don't need to run Docker Desktop on Windows -- regular Linux docker, or podman will work fine inside WSL2. Therefore the issues with nesting containers are essentially the same as for stock Linux.

BenTheElder commented 8 months ago

Therefore the issues with nesting containers are essentially the same as for stock Linux.

We tell people to avoid running kind in docker-in-docker on Linux. It's generally not necessary (it's no more secure than just passing the host dockerd socket, and more effort) and creates a lot of additional problems. There are some use cases where it makes sense, but adding another layer of nested containers is very "here be dragons".

BenTheElder commented 8 months ago

Also the environment in WSL2 is different from Linux run elsewhere, e.g. it often has a custom init system, and we don't have easy access to reproduce and debug (or the time / inclination really, there's so much to do and OSS developers could use Linux and we don't use Windows ourselves, nor is it really supported for developing Kubernetes/Kubernetes https://kind.sigs.k8s.io/docs/contributing/project-scope/)

(Difference in the init, Kernel => different cgroups management => impact on containers)

dboreham commented 8 months ago

init is out of scope here though, since we're running inside a container.

btw it turns out nested kind works just fine now, provided the container has the necessary secret sauce. The stock docker:dind container is an example of such a thing, albeit Alpine so...not for everyone. There is an Ubuntu equivalent image that also works: https://github.com/cruizba/ubuntu-dind

You can start that container, install kind (or k3d) and create a cluster. It can be used as an existence proof from which to generate your own image for CI and so on.

BenTheElder commented 8 months ago

init is out of scope here though, since we're running inside a container.

It's not, the init is responsible for setting up cgroups amongst other things and we're sharing that along with the rest of the kernel from the host since we're using containers instead of VMs. Privileged containers like kind nodes are "leakier" than normal containers but all containers are influenced by the host's init.

dboreham commented 8 months ago

Well, I've tested stock WSL2 on x86 and it works. I'll try aarm64 and report back...

dboreham commented 8 months ago

Well, I've tested stock WSL2 on x86 and it works. I'll try aarm64 and report back...

Reporting back: ARM WSL2 doesn't work :(

dboreham commented 8 months ago

Fwiw, the issue where kind delete cluster followed by kind create cluster fails running in dind (original problem reported above) occurs on regular x64 Ubuntu too (unrelated to WSL2).

BenTheElder commented 5 months ago

This sort of problem is likely eliminated in cgroup v2+ cgroupns hosts and cgroup v1 is going into maintenance mode by Kubernetes https://github.com/kubernetes/enhancements/pull/4572 and deprecated soon by various ecosystem projects (like OCI, systemd)

On cgroup v1 we started forcing cgroupns=private on kind nodes which may help with some of these problems.