kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.43k stars 1.55k forks source link

[PODMAN] kind cluster creation fails on podman using Debian 11; identified a workaround to prevent clusterfail. #2706

Closed debuggerboy closed 1 year ago

debuggerboy commented 2 years ago

Kind cluster creation failed on Podman (rootful) using Debian GNU/Linux 11 :

Kind to create multi-node cluster without errors on Podman using Debian:

Steps to Reproduce:

I am using kind 0.12.0 on Debian 11.

I have a kind config yaml file :

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  "SetHostnameAsFQDN": true
networking:
  ipFamily: ipv4
  kubeProxyMode: "iptables"
nodes:
- role: control-plane
  image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac
- role: worker
  image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac
- role: worker
  image: kindest/node:v1.23.0@sha256:49824ab1727c04e56a21a5d8372a402fcd32ea51ac96a2706a12af38934f81ac

I am trying to create a kind cluster on debian 11 using podman 3.0.1 using below command.

sudo kind create cluster --name ko2 --config /home/debuggerboy/labs/kubernetes/anish-kind/multi-node.yaml

The above command takes a lot of time and then fails with below error:

Full Error : https://gist.github.com/debuggerboy/e7ec918dd984cdbe9e8c1963393d2fe2#file-kind-sigs-cluster-creation-error-txt

Additional details with regards to this error:

After hours of debugging and mutiple attempts, I was able to prevent the kind cluster from failing.

I created an inject script : https://gist.github.com/debuggerboy/e7ec918dd984cdbe9e8c1963393d2fe2#file-podman-fix-for-cluster-fail-sh

I execute the above inject script a few seconds after the kind create cluster is invoked.

This will prevent the kind cluster on podman from failing.

Environment:

Please see : https://gist.github.com/debuggerboy/e7ec918dd984cdbe9e8c1963393d2fe2

Note: I am not having much experience in kubernetes, please kindly ignore if this report does not qualify as a bug. My apologies

Thanks debuggerboy

aojea commented 2 years ago

can you try using the node image that is published with that release?

https://github.com/kubernetes-sigs/kind/releases/tag/v0.12.0

debuggerboy commented 2 years ago

Hello @aojea

Thank you for the reply, I have used the node image 1.23.4 from the page you suggested.

I am currently using: kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9

Please see my yaml file:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  "SetHostnameAsFQDN": true
networking:
  ipFamily: ipv4
  kubeProxyMode: "iptables"
nodes:
- role: control-plane
  image: kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9
- role: worker
  image: kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9
- role: worker
  image: kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9

but, I am still getting the same error : https://gist.github.com/debuggerboy/0bb774772921f8ee342ea82eb4a77ab7#file-kind-sigs-podman-debian-error-txt

I was able to confirm that the work-around "inject.sh" script is still capable of prevent the kind cluster from failing even with node image "kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9" image

See:

Kind Cluster Creation with workaround Inject.sh

Thanks

BenTheElder commented 2 years ago

Can you please minimize the reproducer? For example does it fail without setting any feature gates? Does it fail with only a single node?

Also I am confused by the screenshot where it appears to work.

You can remove the networking: section of config, ipv4 and iptables are default.

debuggerboy commented 2 years ago

@BenTheElder

sorry for the confusion, let me explain.

The screenshot is to prove that the kind create cluster only works when the "inject.sh" is run along side the kind create cluster command.

If I am to leave the kind create cluster command run on its own (i.e. with out the "inject.sh" hack) the it fails with below error: https://gist.github.com/debuggerboy/0bb774772921f8ee342ea82eb4a77ab7#file-kind-sigs-podman-debian-error-txt

let me know if my explanation was useful.

Let me try to run the cluster with a single node. I will share my findings here.

Thanks

debuggerboy commented 2 years ago

@BenTheElder

As per your instruction. I modified my yaml file, as below:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  "SetHostnameAsFQDN": true
networking:
  ipFamily: ipv4
  kubeProxyMode: "iptables"
nodes:
- role: control-plane
  image: kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9

With a single node cluster the create cluster is working successfully.

debuggerboy@cassandra:~$ sudo kind create cluster --name ko2 --config /home/debuggerboy/labs/kubernetes/anish-kind/anish-multi-node.yaml
enabling experimental podman provider
Creating cluster "ko2" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-ko2"
You can now use your cluster with:

kubectl cluster-info --context kind-ko2

Have a nice day! 👋
debuggerboy@cassandra:~$

So, the kind create cluster is only failing when we re trying to create a multi-node kind cluster with rootful podman.

@BenTheElder : when the multi-node kind cluster creation is in progress and we are to run the below "inject.sh" script:

https://gist.github.com/debuggerboy/e7ec918dd984cdbe9e8c1963393d2fe2#file-podman-fix-for-cluster-fail-sh

This workaround script can prevent the multi-node kind cluster from failing, I believe there seems to be an issue with kubedns The workaround script is basically injecting the IPv6 and hostname to "/etc/hosts" file on all kind nodes.

Thanks,

BenTheElder commented 2 years ago

kubedns is not responsible for container hostnames, that's podman. Kubedns (actually coredns) is only responsible for the pod etc.

But kind shouldn't be using hostnames on podman IIRC.

I'm pretty sure all of the fields you are setting for featuregates and networking are default values btw.

aojea commented 2 years ago

The workaround script is basically injecting the IPv6 and hostname to "/etc/hosts" file on all kind nodes.

what is the content of /etc/hosts before you inject the hostnames?

debuggerboy commented 2 years ago

@aojea

As I am trying to create a 3 node kind cluster with 1 controlplane and 2 worker-nodes in Podman.

I have 3 containers:

debuggerboy@cassandra:~$ sudo podman ps -a
CONTAINER ID  IMAGE                                                                                           COMMAND  CREATED         STATUS                     PORTS                      NAMES
bbb1a367872d  docker.io/kindest/node@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9           13 seconds ago  Up Less than a second ago  127.0.0.1:46415->6443/tcp  ko2-control-plane
129c22917949  docker.io/kindest/node@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9           13 seconds ago  Up Less than a second ago                             ko2-worker
c96493507afb  docker.io/kindest/node@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9           13 seconds ago  Up Less than a second ago                             ko2-worker2
debuggerboy@cassandra:~$ 

the state of "/etc/hosts" file before I inject the hostnames is as below :

on ko2-control-plane :

root@ko2-control-plane:/# cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       cassandra.local-demo.lan   cassandra

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

fc00:f853:ccd:e793::93  ko2-control-plane ko2-control-plane
root@ko2-control-plane:/#

on ko2-worker :

root@ko2-worker:/# cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       cassandra.local-demo.lan   cassandra

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

fc00:f853:ccd:e793::94  ko2-worker ko2-worker
root@ko2-worker:/#

on ko2-worker2 :

root@ko2-worker2:/# cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       cassandra.local-demo.lan   cassandra

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

fc00:f853:ccd:e793::95  ko2-worker2 ko2-worker2
root@ko2-worker2:/#

Since there are 3 nodes in my environment, In order for the kind create cluster to create cluster successfully. In my case on each kind node containers the "/etc/hosts" should have the IP Address and Hostname of the other two container nodes as well.

The create cluster is failing at "✗ Joining worker nodes 🚜" , it seems the kind containers are not able address each other.

debuggerboy@cassandra:~$ sudo kind create cluster --name ko2 --config /home/debuggerboy/labs/kubernetes/anish-kind/anish-multi-node.yaml
enabling experimental podman provider
Creating cluster "ko2" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼
 ✓ Preparing nodes 📦 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✗ Joining worker nodes 🚜
ERROR: failed to create cluster: failed to join node with kubeadm: command "podman exec --privileged ko2-worker kubeadm join --config /kind/kubeadm.conf --ski
p-phases=preflight --v=6" failed with error: exit status 1
Command Output: I0404 19:59:01.286228     112 join.go:413] [preflight] found NodeName empty; using OS hostname as NodeName

But when the "inject" script is executed it adds the IP address of the other two nodes to "/etc/hosts" only then the "Joining worker nodes" complete successfully and finish creating kind cluster without errors.

Thanks,

aojea commented 2 years ago

why is podman creating the hosts entries with ipv6?

fc00:f853:ccd:e793::93 ko2-control-plane ko2-control-plane

that is the problem,

BenTheElder commented 2 years ago

seems like this is a podman bug 🤔 @aojea any further thoughts?

aojea commented 2 years ago

seems like this is a podman bug thinking @aojea any further thoughts?

it seems podman or some environmental issue ... tagged as external meanwhile there is no more evidence

flouthoc commented 2 years ago

Hi @debuggerboy , If version is not a constrain here do you mind trying this with podman 4.0.0-dev or even better something from upstream. I am unable to reproduce this issue with 4.0.0-dev from upstream and my network stack is netavark/aardvark-dns but it should work with CNI as well.

debuggerboy commented 2 years ago

Hello @flouthoc

I am using Debian 11 (bullseye) on my laptop. I believe I am already using the latest version of podman available for bullseye.

Is there any deb packages for podman-4.0.0-dev available? Which I can try on Debian 11?

Thanks

aojea commented 2 years ago

@debuggerboy podman community should be answer that question better than us https://github.com/containers/podman

stefancocora commented 1 year ago

Podman >= 4.0 fixes this issue. I've tested using podman v4

aardvark-dns 1.3.0-1
netavark 1.1.0-1
podman 4.1.1-2

Verify if your podman network backend is cni

podman info --format {{.Host.NetworkBackend}}
cni

If it is set to cni the fix for this issue is to change the network backend to aardvark See the man page for man containers.conf Search network_backend to understand how to change the network backend to aardvark

You can leave the cni-plugins pkg installed, podman will obey the containers.conf configuration.