siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.01k stars 489 forks source link

Docker cluster fails to start (cgroupv2 incompatibility?) #3634

Closed hainesbg closed 2 years ago

hainesbg commented 3 years ago

Bug Report

A Docker-based local cluster create on up-to-date Arch Linux times out, apparently due to an incompatibility with cgroups v2, analogously to https://github.com/rancher/k3d/issues/493

Description

$ talosctl cluster create --cidr 192.168.224.0/24
validating CIDR and reserving IPs
generating PKI and tokens
downloading ghcr.io/talos-systems/talos:v0.10.2
creating network talos-default
creating master nodes
creating worker nodes
renamed talosconfig context "talos-default" -> "talos-default-1"
waiting for API
bootstrap error: 2 error(s) occurred:
    rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.224.2:50000: connect: connection refused"
    timeout

If systemd.unified_cgroup_hierarchy=0 is added to the host system's kernel parameters, i.e. cgroupv1 is used, rather than v2, it works as expected:

$ talosctl cluster create --cidr 192.168.224.0/24
validating CIDR and reserving IPs
generating PKI and tokens
downloading ghcr.io/talos-systems/talos:v0.10.2
creating network talos-default
creating master nodes
creating worker nodes
renamed talosconfig context "talos-default" -> "talos-default-1"
waiting for API
bootstrapping cluster
waiting for etcd to be healthy: OK
waiting for bootkube to finish: OK
waiting for apid to be ready: OK
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: OK
waiting for all k8s nodes to report ready: OK
waiting for all control plane components to be ready: OK
waiting for kube-proxy to report ready: OK
waiting for coredns to report ready: OK
waiting for all k8s nodes to report schedulable: OK

merging kubeconfig into "/home/haines/.kube/config"
renamed auth info "admin@talos-default" -> "admin@talos-default-3"
renamed context "admin@talos-default" -> "admin@talos-default-3"
PROVISIONER       docker
NAME              talos-default
NETWORK NAME      talos-default
NETWORK CIDR      192.168.224.0/24
NETWORK GATEWAY   192.168.224.1
NETWORK MTU       1500

NODES:

NAME                      TYPE           IP              CPU    RAM      DISK
/talos-default-master-1   controlplane   192.168.224.2   2.00   2.1 GB   -
/talos-default-worker-1   join           192.168.224.3   2.00   2.1 GB   -

Logs

$ docker logs talos-default-master-1
...
[talos] 2021/05/18 13:44:27 service[trustd](Waiting): Error running Containerd(trustd), going to restart forever: failed to create task: "trustd": OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: cannot enter cgroupv2 "/sys/fs/cgroup/system" with domain controllers -- it is in threaded mode: unknown
[talos] 2021/05/18 13:44:32 service[trustd](Waiting): Error running Containerd(trustd), going to restart forever: failed to create task: "trustd": OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: cannot enter cgroupv2 "/sys/fs/cgroup/system" with domain controllers -- it is in threaded mode: unknown
[talos] 2021/05/18 13:44:35 service[kubelet](Waiting): Error running Containerd(kubelet), going to restart forever: task "kubelet" failed: exit code 1

Environment

smira commented 3 years ago

Might be something around https://github.com/moby/moby/blob/e0170da0dc6e660594f98bc66e7a98ce9c2abb46/hack/dind#L28-L37

jgillich commented 2 years ago

This ^ is also what k3d does, and it works well. Could this be integrated in the Docker setup? It's super annoying to still have to deal with cgroupv2 issues three years after it becoming the default on Fedora

smira commented 2 years ago

@jgillich @hainesbg Talos 0.12 should work fine in Docker mode if the host is using cgroupsv2