k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.62k stars 2.24k forks source link

k3s failed to get CA certs #10005

Closed seyfullah642 closed 3 weeks ago

seyfullah642 commented 3 weeks ago

Environmental Info: K3s Version: k3s version v1.29.3+k3s1 (8aecc26b) go version go1.21.8

Node(s) CPU architecture, OS, and Version: raspberry pi 5 x 2 PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"

Cluster Configuration: I have two raspberry pi 5's. One as a master node and the other as a worker node.

Describe the bug: When installing k3s on the worker node I get the error below

seyfullah@rpi-worker-1:~ sudo systemctl status k3s-agent.service
● k3s-agent.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; preset: enabled)
     Active: activating (start) since Tue 2024-04-23 00:55:34 BST; 30s ago
       Docs: https://k3s.io
    Process: 3690 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=exited, status=0/SUCCESS)
    Process: 3692 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 3693 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 3694 (k3s-agent)
      Tasks: 9
     Memory: 222.2M
        CPU: 3.041s
     CGroup: /system.slice/k3s-agent.service
             └─3694 "/usr/local/bin/k3s agent"

Apr 23 00:55:34 rpi-worker-1 systemd[1]: Starting k3s-agent.service - Lightweight Kubernetes...
Apr 23 00:55:34 rpi-worker-1 sh[3690]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Apr 23 00:55:34 rpi-worker-1 k3s[3694]: time="2024-04-23T00:55:34+01:00" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
Apr 23 00:55:34 rpi-worker-1 k3s[3694]: time="2024-04-23T00:55:34+01:00" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/7ddd49d3724e00d95d2af069d3247eaeb6635abe80397c8d94d4053d>
Apr 23 00:55:37 rpi-worker-1 k3s[3694]: time="2024-04-23T00:55:37+01:00" level=info msg="Starting k3s agent v1.29.3+k3s1 (8aecc26b)"
Apr 23 00:55:37 rpi-worker-1 k3s[3694]: time="2024-04-23T00:55:37+01:00" level=info msg="Adding server to load balancer k3s-agent-load-balancer: 192.168.1.231:6444"
Apr 23 00:55:37 rpi-worker-1 k3s[3694]: time="2024-04-23T00:55:37+01:00" level=info msg="Running load balancer k3s-agent-load-balancer 127.0.0.1:6444 -> [192.168.1.231:6444] [default: 192.16>
Apr 23 00:55:43 rpi-worker-1 k3s[3694]: time="2024-04-23T00:55:43+01:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:39196->127.0.0.1:>
Apr 23 00:55:51 rpi-worker-1 k3s[3694]: time="2024-04-23T00:55:51+01:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:51536->127.0.0.1:>
Apr 23 00:55:59 rpi-worker-1 k3s[3694]: time="2024-04-23T00:55:59+01:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:44796->127.0.0.1:>

I have spent days trying to trouble shoot this error and I can't seem to find out what the problem is.

One issue I noticed is that I cannot curl the master node ip address from the worker node

curl -vk https://192.168.1.231:6443/cacerts
*   Trying 192.168.1.231:6443...
* connect to 192.168.1.231 port 6443 failed: No route to host
* Failed to connect to 192.168.1.231 port 6443 after 197 ms: Couldn't connect to server
* Closing connection 0
curl: (7) Failed to connect to 192.168.1.231 port 6443 after 197 ms: Couldn't connect to server

I've checked what ports are open and the master node is listening to

root@rpi-master:~# netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      787/sshd: /usr/sbin
tcp        0      0 127.0.0.1:6444          0.0.0.0:*               LISTEN      111921/k3s server
tcp        0      0 127.0.0.1:10258         0.0.0.0:*               LISTEN      111921/k3s server
tcp        0      0 127.0.0.1:10259         0.0.0.0:*               LISTEN      111921/k3s server
tcp        0      0 127.0.0.1:10256         0.0.0.0:*               LISTEN      111921/k3s server
tcp        0      0 127.0.0.1:10257         0.0.0.0:*               LISTEN      111921/k3s server
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      111921/k3s server
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      111921/k3s server
tcp        0      0 127.0.0.1:10010         0.0.0.0:*               LISTEN      111946/containerd
tcp6       0      0 :::22                   :::*                    LISTEN      787/sshd: /usr/sbin
tcp6       0      0 :::10250                :::*                    LISTEN      111921/k3s server
tcp6       0      0 :::6443                 :::*                    LISTEN      111921/k3s server

I honestly don't know what to do anymore other than open a ticket here

Steps To Reproduce:

Expected behavior: No cert error and I can connect the worker node to master

brandond commented 3 weeks ago

192.168.1.231:6444

Did you typo the port in the server URL that you are using to join the agent? This should normally be 6443.

One issue I noticed is that I cannot curl the master node ip address from the worker node

curl -vk https://192.168.1.231:6443/cacerts

Well that would be a problem. Why can't you?