zealvora / certified-kubernetes-security-specialist

163 stars 441 forks source link

Having issue when trying to install kubeadm on ubuntu vm #23

Open vignan-devops opened 1 year ago

vignan-devops commented 1 year ago

The etcd pods keep restarting due to which kube-api server keeps restarting. can you please me here.

vignan-devops commented 1 year ago

root@kubeadm-master:~# kubeadm init --pod-network-cidr=10.244.0.0/16 I0502 08:12:08.289522 4204 version.go:255] remote version is much newer: v1.27.1; falling back to: stable-1.24 [init] Using Kubernetes version: v1.24.13 [preflight] Running pre-flight checks [WARNING SystemVerification]: missing optional cgroups: blkio error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR CRI]: container runtime is not running: output: E0502 08:12:08.411082 4213 remote_runtime.go:925] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService" time="2023-05-02T08:12:08Z" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService" , error: exit status 1 [preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=... To see the stack trace of this error execute with --v=5 or higher

vignan-devops commented 1 year ago

root@kubeadm-master:/run/containerd# crictl ps -a WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. ERRO[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory" WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. ERRO[0000] unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory" CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD e1c6b7a653f47 97b8277912b10 49 seconds ago Running kube-controller-manager 5 ce2df798f69b1 kube-controller-manager-kubeadm-master b8873f5990a15 45e1452244b77 About a minute ago Exited kube-apiserver 4 b567cea51bb70 kube-apiserver-kubeadm-master fedc97fbc9529 dd23e01bac419 About a minute ago Exited kube-proxy 5 a2ae7a907536e kube-proxy-hwrsn f12813310cda5 043417219315f 2 minutes ago Running kube-scheduler 4 f5ecfd8922ec2 kube-scheduler-kubeadm-master 6b0f05e29e537 97b8277912b10 2 minutes ago Exited kube-controller-manager 4 b8022131d8291 kube-controller-manager-kubeadm-master f6fdfe440116b aebe758cef4cd 2 minutes ago Exited etcd 6 999104c622f0c etcd-kubeadm-master daf2cee45fd93 043417219315f 3 minutes ago Exited kube-scheduler 3 8b39253101d3d kube-scheduler-kubeadm-master

vignan-devops commented 1 year ago

The etcd pods keep failing with the below issue:

{"level":"info","ts":"2023-05-02T08:32:51.335Z","caller":"embed/etcd.go:368","msg":"closing etcd server","name":"kubeadm-master","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://159.223.189.10:2380"],"advertise-client-urls":["https://159.223.189.10:2379"]} WARNING: 2023/05/02 08:32:51 [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1:2379 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting... WARNING: 2023/05/02 08:32:51 [core] grpc: addrConn.createTransport failed to connect to {159.223.189.10:2379 159.223.189.10:2379 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 159.223.189.10:2379: connect: connection refused". Reconnecting...

1000hi commented 2 weeks ago

Hi, i had the same issues copying the commands to setup kubeadm. In the end i just didnt followed some of the commands and it worked fine. A little long but here is how i did it :

I followed the beginning of the commands :

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sysctl --system

Installed containerd.io instead

sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo   "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" |   sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
apt install containerd.io

Check if containerd is running

crictl ps -a

If you get errors

WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
ERRO[0000] validate service connection: validate CRI v1 runtime API for endpoint "unix:///run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService
ERRO[0000] validate service connection: validate CRI v1 runtime API for endpoint "unix:///run/crio/crio.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/crio/crio.sock: connect: no such file or directory"

Delete old config

rm /etc/containerd/config.toml

You should get

 crictl ps -a
WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
WARN[0000] image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD

The solution i found to get rid of the warning

root@kubeadm-master:~# crictl config --set runtime-endpoint=unix:///run/containerd/containerd.soc
root@kubeadm-master:~# crictl ps -a

Install newer version of the main tools

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
apt-cache madison kubeadm
sudo apt-get install -y kubelet  kubeadm  kubectl cri-tools
sudo apt-mark hold kubelet kubeadm kubectl

Init kubeadm with newer version

kubeadm init --pod-network-cidr=10.244.0.0/16 --kubernetes-version=1.31.1
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Install the network plugin (flannel here)

kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

It should works by now

root@kubeadm-master:~# kubectl get nodes
NAME             STATUS   ROLES           AGE    VERSION
kubeadm-master   Ready    control-plane   2m1s   v1.31.