kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.39k stars 1.55k forks source link

Pop!_OS 21.04 Fails to create cluster on rootless podman #2495

Open AGhost-7 opened 3 years ago

AGhost-7 commented 3 years ago

What happened: I tried to run create cluster on podman:

export KIND_EXPERIMENTAL_PROVIDER=podman
kind create cluster

It fails to initialize the cluster:

couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:114
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
    _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:225
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1371
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895
k8s.io/kubernetes/cmd/kubeadm/app.Run
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
    _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:225
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1371

What you expected to happen:

I have a cluster running on rootless podman.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

containerd.log export tailed:

Oct 12 15:08:36 kind-control-plane containerd[149]: time="2021-10-12T15:08:36.229877316Z" level=error msg="copy shim log" error="read /proc/self/fd/18: file already closed"
Oct 12 15:08:36 kind-control-plane containerd[149]: time="2021-10-12T15:08:36.232072953Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-controller-manager-kind-control-plane,Uid:46dac9a538838115821dfd9559149484,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim: failed to mount rootfs component &{overlay overlay [index=off workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/96/work upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/96/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/7/fs]}: invalid argument: unknown"
Oct 12 15:08:37 kind-control-plane containerd[149]: time="2021-10-12T15:08:37.153039625Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:etcd-kind-control-plane,Uid:e2736c1c9d7dd71f3d030f119202c0a3,Namespace:kube-system,Attempt:0,}"
Oct 12 15:08:37 kind-control-plane containerd[149]: time="2021-10-12T15:08:37.184150610Z" level=info msg="starting signal loop" namespace=k8s.io path=/run/containerd/io.containerd.runtime.v2.task/k8s.io/0f73145d2a1abae7cc3975a666a8402370f6ab706c58e481ffe3092ea8bfdc63 pid=2963
Oct 12 15:08:37 kind-control-plane containerd[149]: time="2021-10-12T15:08:37.187593412Z" level=info msg="shim disconnected" id=0f73145d2a1abae7cc3975a666a8402370f6ab706c58e481ffe3092ea8bfdc63
Oct 12 15:08:37 kind-control-plane containerd[149]: time="2021-10-12T15:08:37.187764249Z" level=warning msg="cleaning up after shim disconnected" id=0f73145d2a1abae7cc3975a666a8402370f6ab706c58e481ffe3092ea8bfdc63 namespace=k8s.io
Oct 12 15:08:37 kind-control-plane containerd[149]: time="2021-10-12T15:08:37.187802521Z" level=info msg="cleaning up dead shim"
Oct 12 15:08:37 kind-control-plane containerd[149]: time="2021-10-12T15:08:37.212436396Z" level=warning msg="cleanup warnings time=\"2021-10-12T15:08:37Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=2973\n"
Oct 12 15:08:37 kind-control-plane containerd[149]: time="2021-10-12T15:08:37.213218561Z" level=error msg="copy shim log" error="read /proc/self/fd/18: file already closed"
Oct 12 15:08:37 kind-control-plane containerd[149]: time="2021-10-12T15:08:37.217359323Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-kind-control-plane,Uid:e2736c1c9d7dd71f3d030f119202c0a3,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim: failed to mount rootfs component &{overlay overlay [index=off workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/97/work upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/97/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/7/fs]}: invalid argument: unknown"

kubelet log export tailed:

Oct 12 15:08:39 kind-control-plane kubelet[213]: E1012 15:08:39.228395     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 12 15:08:39 kind-control-plane kubelet[213]: E1012 15:08:39.329202     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 12 15:08:39 kind-control-plane kubelet[213]: E1012 15:08:39.430216     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 12 15:08:39 kind-control-plane kubelet[213]: E1012 15:08:39.531080     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 12 15:08:39 kind-control-plane kubelet[213]: E1012 15:08:39.632112     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 12 15:08:39 kind-control-plane kubelet[213]: E1012 15:08:39.732393     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 12 15:08:39 kind-control-plane kubelet[213]: E1012 15:08:39.833329     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 12 15:08:39 kind-control-plane kubelet[213]: E1012 15:08:39.934202     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 12 15:08:40 kind-control-plane kubelet[213]: E1012 15:08:40.033075     213 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kind-control-plane.16ad50c4989d9a77", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"kind-control-plane", UID:"kind-control-plane", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node kind-control-plane status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"kind-control-plane"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc05186d94688e877, ext:11967550929, loc:(*time.Location)(0x74aba00)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc05186d94688e877, ext:11967550929, loc:(*time.Location)(0x74aba00)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://kind-control-plane:6443/api/v1/namespaces/default/events": dial tcp [fc00:f853:ccd:e793::2]:6443: connect: connection refused'(may retry after sleeping)
Oct 12 15:08:40 kind-control-plane kubelet[213]: E1012 15:08:40.035198     213 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"

Environment:

BenTheElder commented 3 years ago

This seems likely to be the same root issue as https://github.com/kubernetes-sigs/kind/issues/2486 w/ fuse-overlayfs ?

AGhost-7 commented 3 years ago

How are people running podman? The instructions for rootless podman explicitly say to use fuse-overlayfs: https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md#ensure-fuse-overlayfs-is-installed. The kind documentation doesn't explain to change it either.

BenTheElder commented 3 years ago

As far as I know to run rootless podman people are using fuse-overlayfs, but there seems to be some issue with it on your distro, tentatively?

So far most users of rootless podman are on Fedora.

BenTheElder commented 3 years ago

We have both rootless docker and podman in CI with fuse-overlayfs, but that is under Fedora currently.

aojea commented 2 years ago

can you try using the env variable?

KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER=fuse-overlayfs
AGhost-7 commented 2 years ago

It still fails with the same error from kind.

containerd.log:

Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.771637815Z" level=info msg="cleaning up dead shim"
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.774032638Z" level=info msg="shim disconnected" id=f440dc1ab7daf9e67e36cd9121893a7c98eb401b117a36f0b268b9b5aabe7acd
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.774388364Z" level=warning msg="cleaning up after shim disconnected" id=f440dc1ab7daf9e67e36cd9121893a7c98eb401b117a36f0b268b9b5aabe7acd namespace=k8s.io
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.774458219Z" level=info msg="cleaning up dead shim"
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.810751564Z" level=warning msg="cleanup warnings time=\"2021-10-15T15:05:57Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=2608\n"
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.811216248Z" level=error msg="copy shim log" error="read /proc/self/fd/18: file already closed"
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.814519456Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-kind-control-plane,Uid:e2736c1c9d7dd71f3d030f119202c0a3,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim: failed to mount rootfs component &{overlay overlay [index=off workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/85/work upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/85/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/7/fs]}: invalid argument: unknown"
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.814602924Z" level=warning msg="cleanup warnings time=\"2021-10-15T15:05:57Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=2612\n"
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.815084198Z" level=error msg="copy shim log" error="read /proc/self/fd/20: file already closed"
Oct 15 15:05:57 kind-control-plane containerd[149]: time="2021-10-15T15:05:57.818247390Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-kind-control-plane,Uid:69dd939498054a211c3461b2a9cc8d26,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim: failed to mount rootfs component &{overlay overlay [index=off workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/86/work upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/86/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/7/fs]}: invalid argument: unknown"

kubelet.log:

Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.240944     210 certificate_manager.go:437] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Post "https://kind-control-plane:6443/apis/certificates.k8s.io/v1/certificatesigningrequests": dial tcp [fc00:f853:ccd:e793::2]:6443: connect: connection refused
Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.287083     210 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.295837     210 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://kind-control-plane:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/kind-control-plane?timeout=10s": dial tcp [fc00:f853:ccd:e793::2]:6443: connect: connection refused
Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.387586     210 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.488529     210 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 15 15:06:00 kind-control-plane kubelet[210]: I1015 15:06:00.493845     210 kubelet_node_status.go:71] "Attempting to register node" node="kind-control-plane"
Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.494643     210 kubelet_node_status.go:93] "Unable to register node with API server" err="Post \"https://kind-control-plane:6443/api/v1/nodes\": dial tcp [fc00:f853:ccd:e793::2]:6443: connect: connection refused" node="kind-control-plane"
Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.589278     210 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.690265     210 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
Oct 15 15:06:00 kind-control-plane kubelet[210]: E1015 15:06:00.791500     210 kubelet.go:2291] "Error getting node" err="node \"kind-control-plane\" not found"
aojea commented 2 years ago

and disabling selinux?

AGhost-7 commented 2 years ago

No selinux on pop.

BenTheElder commented 2 years ago

https://github.com/kubernetes-sigs/kind/issues/2495#issuecomment-944233212 @aojea that requires changes that are not released yet anyhow. some rootless fixes will be in the next release.