techiescamp / vagrant-kubeadm-kubernetes

Vagrantfile & Scripts to setup Kubernetes Cluster using Kubeadm for CKA, CKAD and CKS practice environment
https://devopscube.com/kubernetes-cluster-vagrant/
GNU General Public License v3.0
709 stars 691 forks source link

Can not init master node successful #78

Open shawnsang opened 3 days ago

shawnsang commented 3 days ago

I've fetched the latest code, but I always get error when execute kubeadm init in master.sh.

Because our location can not direct fetch images from google, so add another --image-repository, also I add ‘ --control-plane-endpoint’, but still get same error.

The errror message like below shown:

    controlplane: + echo 'Preflight Check Passed: Downloaded All Required Images'
    controlplane: + sudo kubeadm init --apiserver-advertise-address=10.0.0.10 --control-plane-endpoint=10.0.0.10 --apiserver-cert-extra-sans=10.0.0.10 --pod-network-cidr=172.16.1.0/16 --service-cidr=172.17.1.0/18 --node-name controlplane --ignore-preflight-errors Swap --image-repository registry.aliyuncs.com/google_containers
    controlplane: [init] Using Kubernetes version: v1.31.0
    controlplane: [preflight] Running pre-flight checks
    controlplane: [preflight] Pulling images required for setting up a Kubernetes cluster
    controlplane: [preflight] This might take a minute or two, depending on the speed of your internet connection
    controlplane: [preflight] You can also perform this action beforehand using 'kubeadm config images pull'
    controlplane: W0927 03:07:16.984168    6083 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.10" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.aliyuncs.com/google_containers/pause:3.10" as the CRI sandbox image.
    controlplane: [certs] Using certificateDir folder "/etc/kubernetes/pki"
    controlplane: [certs] Generating "ca" certificate and key
    controlplane: [certs] Generating "apiserver" certificate and key
    controlplane: [certs] apiserver serving cert is signed for DNS names [controlplane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [172.17.0.1 10.0.0.10]
    controlplane: [certs] Generating "apiserver-kubelet-client" certificate and key
    controlplane: [certs] Generating "front-proxy-ca" certificate and key
    controlplane: [certs] Generating "front-proxy-client" certificate and key
    controlplane: [certs] Generating "etcd/ca" certificate and key
    controlplane: [certs] Generating "etcd/server" certificate and key
    controlplane: [certs] etcd/server serving cert is signed for DNS names [controlplane localhost] and IPs [10.0.0.10 127.0.0.1 ::1]
    controlplane: [certs] Generating "etcd/peer" certificate and key
    controlplane: [certs] etcd/peer serving cert is signed for DNS names [controlplane localhost] and IPs [10.0.0.10 127.0.0.1 ::1]
    controlplane: [certs] Generating "etcd/healthcheck-client" certificate and key
    controlplane: [certs] Generating "apiserver-etcd-client" certificate and key
    controlplane: [certs] Generating "sa" key and public key
    controlplane: [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
    controlplane: [kubeconfig] Writing "admin.conf" kubeconfig file
    controlplane: [kubeconfig] Writing "super-admin.conf" kubeconfig file
    controlplane: [kubeconfig] Writing "kubelet.conf" kubeconfig file
    controlplane: [kubeconfig] Writing "controller-manager.conf" kubeconfig file
    controlplane: [kubeconfig] Writing "scheduler.conf" kubeconfig file
    controlplane: [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
    controlplane: [control-plane] Using manifest folder "/etc/kubernetes/manifests"
    controlplane: [control-plane] Creating static Pod manifest for "kube-apiserver"
    controlplane: [control-plane] Creating static Pod manifest for "kube-controller-manager"
    controlplane: [control-plane] Creating static Pod manifest for "kube-scheduler"
    controlplane: [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    controlplane: [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    controlplane: [kubelet-start] Starting the kubelet
    controlplane: [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
    controlplane: [kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
    controlplane: [kubelet-check] The kubelet is healthy after 505.732985ms
    controlplane: [api-check] Waiting for a healthy API server. This can take up to 4m0s
    controlplane: [api-check] The API server is not healthy after 4m0.001359696s
    controlplane:
    controlplane: Unfortunately, an error has occurred:
    controlplane:       context deadline exceeded
    controlplane:
    controlplane: This error is likely caused by:
    controlplane:       - The kubelet is not running
    controlplane:       - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    controlplane:
    controlplane: If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    controlplane:       - 'systemctl status kubelet'
    controlplane:       - 'journalctl -xeu kubelet'
    controlplane:
    controlplane: Additionally, a control plane component may have crashed or exited when started by the container runtime.
    controlplane: To troubleshoot, list all containers using your preferred container runtimes CLI.
    controlplane: Here is one example how you may list all running Kubernetes containers by using crictl:
    controlplane:       - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
    controlplane:       Once you have found the failing container, you can inspect its logs with:
    controlplane:       - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock logs CONTAINERID'
    controlplane: error execution phase wait-control-plane: could not initialize a Kubernetes cluster
    controlplane: To see the stack trace of this error execute with --v=5 or higher
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

So I try to login master node, and execute kubelet for more information, the error message is that certificate and key is not correct for kubelet.

PS D:\workroom\vagrant-kubeadm-kubernetes> vagrant ssh controlplane
Welcome to Ubuntu 24.04 LTS (GNU/Linux 6.8.0-31-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 System information as of Fri Sep 27 03:12:45 AM UTC 2024

  System load:  0.03               Processes:             146
  Usage of /:   15.2% of 30.34GB   Users logged in:       0
  Memory usage: 8%                 IPv4 address for eth0: 10.0.2.15
  Swap usage:   0%

This system is built by the Bento project by Chef Software
More information can be found at https://github.com/chef/bento

Use of this system is acceptance of the OS vendor EULA and License Agreements.
vagrant@controlplane:~$ kubelet
E0927 03:12:49.585291    6272 run.go:72] "command failed" err="failed to construct kubelet dependencies: error reading /var/lib/kubelet/pki/kubelet.key, certificate and key must be supplied as a pair"
vagrant@controlplane:~$

Any suggestion for debugging and resolve it?

shawnsang commented 3 days ago

Find some interest thing, for parameter KUBELET_EXTRA_ARGS. In /etc/default/kubelet, the information like below

KUBELET_EXTRA_ARGS=--node-ip=10.0.0.10

But using script will see the ip address is 10.0.2.15

vagrant@controlplane:~$ local_ip="$(ip --json addr show eth0 | jq -r '.[0].addr_info[] | select(.family == "inet") | .local')"
vagrant@controlplane:~$ echo $local_ip
10.0.2.15