rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.58k stars 270 forks source link

Install CNI cannot connect to kubernetes service in a cluster installed from quickstart #4283

Closed starizard closed 1 year ago

starizard commented 1 year ago

Environmental Info: RKE2 Version:

rke2 -v
rke2 version v1.25.9+rke2r1 (842d05e64bcbf78552f1db0b32700b8faea403a0)
go version go1.19.8 X:boringcrypto

Node(s) CPU architecture, OS, and Version:

$ uname -a
Linux k8s-master-1 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: Single cloud vm instance with 2 cores running Ubuntu 22.04.2 LTS

Describe the bug: I am going through the QuickStart and I have an issue where my CNI pod doesn't come up because it cannot talk to the api server, (i have tried different cnis and I get the same issue).

Steps To Reproduce: no config file created by me

ufw disable
curl -sfL https://get.rke2.io/ |  sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service

Expected behavior: expect the server node to be ready

Actual behavior: The pod rke2-canal is stuck in an init crashloopbackoff. and from the logs of the install-cni container, I see that it cannot connect to the kubernetes service:

2023-05-23 10:36:11.795 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "[https://10.43.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/canal/token](https://10.43.0.1/api/v1/namespaces/kube-system/serviceaccounts/canal/token)": dial tcp 10.43.0.1:443: i/o timeout

this is my service and endpoints:

root@k8s-master-1:~# kubectl get svc -owide
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE   SELECTOR
kubernetes   ClusterIP   10.43.0.1    <none>        443/TCP   14m   <none>

root@k8s-master-1:~# kubectl get endpoints -owide
NAME         ENDPOINTS            AGE
kubernetes   45.76.137.187:6443   15m

I can reach the endpoint

$ kubectl exec -it etcd-k8s-master-1 -nkube-system -- curl -vk https://45.76.137.187:6443/
....
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
* Connection #0 to host 45.76.137.187 left intact

but I cannot reach the kubernetes service

$ kubectl exec -it etcd-k8s-master-1 -nkube-system -- curl -vk https://10.43.0.1/
* Uses proxy env variable NO_PROXY == '.svc,.cluster.local,10.42.0.0/16,10.43.0.0/16'
*   Trying 10.43.0.1:443...
* TCP_NODELAY set

Additional context / logs: journal-logs.log

root@k8s-master-1:~# systemctl status apparmor
● apparmor.service - Load AppArmor profiles
     Loaded: loaded (/lib/systemd/system/apparmor.service; enabled; vendor preset: enabled)
     Active: active (exited) since Tue 2023-05-23 09:30:43 UTC; 1h 38min ago
       Docs: man:apparmor(7)
             https://gitlab.com/apparmor/apparmor/wikis/home/
   Main PID: 455 (code=exited, status=0/SUCCESS)
        CPU: 49ms

May 23 09:30:43 guest systemd[1]: Starting Load AppArmor profiles...
May 23 09:30:43 guest apparmor.systemd[455]: Restarting AppArmor
May 23 09:30:43 guest apparmor.systemd[455]: Reloading AppArmor profiles
May 23 09:30:43 guest apparmor.systemd[486]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
May 23 09:30:43 guest systemd[1]: Finished Load AppArmor profiles.

root@k8s-master-1:~# systemctl status NetworkManager.service
Unit NetworkManager.service could not be found.
brandond commented 1 year ago

Do you have any iptables based local firewalls such as ufw or firewalld enabled?

starizard commented 1 year ago

no, I dont have firewalld and I disabled ufw

brandond commented 1 year ago

Can you run kubectl get pod -A -o wide? I suspect that with only 2 cores you don't have enough CPU available to schedule all the pods. We recommend at least 4 cores for server nodes.

liyang516 commented 1 year ago

@starizard Do you resolve this problem?

liyang516 commented 1 year ago

@starizard How did you solve this problem?